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Multiple sclerosis (MS) is a demyelinizing disease 
of the central nervous system (CNS) of which the complete 
cause still remains unknown. 

5 

Numerous studies have supported the hypothesis 
for a viral etiology of the disease, but none of 
the known viruses tested has proved to be the causative 
agent tested for: a review of the viruses tested for in MS 
10 for many years has been carried out by E. Norrby and R.T. 
Johnson. 

Recently, a retrovirus, different from the known 
human retroviruses, was isolated from patients suffering 
from MS. The authors were able to show that this 

15 retrovirus could be transmitted in vitro, that patients 
suffering from MS produced antibodies capable of 
recognizing proteins associated with the infection of the 
leptomeningeal cells by this retrovirus, and that the 
expression of the latter could be greatly stimulated by 

20 the immediate-early genes of some herpesviruses. 

All these results argue in favor of the role in MS 
of at least one unknown retrovirus or of a virus having a 
reverse transcriptase (RT) activity which is detectable by 
the method published by H. Perron and termed "LM7-type RT" 

25 activity. 

The studies by the applicant have made it possible 
to obtain two continuous cell lines infected with natural 
isolates obtained from two different patients suffering 
from MS, by a culture method as described in the document 

30 WO-A-93 20188, whose content is incorporated by reference 
into the present description. These two lines derived from 
cells of human choroid plexus, called LM7PC and PLI-2, 
were deposited at the E.C.A.C.C. on 22 July 1992 and 8 
January 1993, respectively, under numbers 92 072201 and 93 

35 010817, in accordance with the provisions of the Treaty of 
Budapest. Moreover, the viral isolates possessing an 



LM7-type RT activity have also been deposited at the 
E.C.A.C.C. under the overall name of "strains". The 
"strain" or isolate harbored by the PLI-2 line, called 
POL-2, was deposited at the E.C.A.C.C. on 22 July 1992 
5 under No. V92072202. The "strain" or isolate harbored by 
the LM7PC line, called MS7PG, was deposited at the 
E.C.A.C.C. on 8 January 1993 under No. V93010816. 

Using the above-mentioned cultures and isolates, 
characterized by biological and morphological criteria, 

10 efforts were then made to characterize the genetic 
material associated -with the viral particles produced in 
these cultures . 

The proportions of genome already characterized 
were used to develop molecular detection tests for the 

15 viral genome and immunoserological tests, using the amino 
acid sequences encoded by the nucleotide sequences of the 
viral genome, in order to detect the immune response 
directed against epitopes associated with the viral 

t infection and/or expression. 

20 These tools have already made it possible to 

confirm an association between MS and the expression of 
the sequences .identified in the patents cited further on. 
However, the viral system discovered by the applicant is 
related to a comple-x retroviral system. Indeed, the 

25 sequences which are found to be encapsidated in the 
extracellular viral particles produced by the different 
cultures of cells of patients suffering from MS show 
clearly that there is co encapsidation of retroviral 
genomes which are related but different from the 

30 "wild-type" retroviral genome which produces the 
infectious viral particles. This phenomenon was observed" 
between replicative retroviruses and endogenous 
retroviruses belonging" to the same family, or even 
heterologous retroviruses. The concept of endogenous 

35 retrovirus is very important in the context of our 
discovery because, in the case of MSRV-1, it has been 
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observed that endogenous retroviral sequences comprising 
sequences homologous to the MSRV-1 genome exist in normal 
human DNA. The existence of endogenous retroviral elements 
(ERV) related to MSRV-1 through all or part of their 
5 genome explains the fact that the expression of the MSRV-1 
retrovirus in human cells can interact with related 
endogenous sequences. These interactions are found in the 
case of pathogenic and/or infectious endogenous 
retroviruses (for example some ecotropic strains of the 

10 Murine Leukemia virus) , in the case of exogenous 
retroviruses whose nucleotide sequence may be found 
partially or completely in the form of ERVs, in the genome 
of the host animal (e.g. mouse mammary tumor exogenous 
virus transmitted via milk) . These interactions consist 

15 mainly of (i) a transactivat ion or co-activation of ERVs 
by the replicative retrovirus, (ii) an "illegitimate" 
encapsidation of related RNAs of ERVs, or of ERVs - or 
even of cellular RNAs - simply possessing compatible 
encapsidation sequences, into the retroviral particles 

20 produced by the expression of the replicative strain, 
which are sometimes transmissible and sometimes with an 
inherent pathogenicity, and (iii) relatively high 
recombinations between the co-encapsidated genomes, in 
particular in the reverse transcription phases, which lead 

25 to the formation of hybrid genomes, which are sometimes 
transmissible and sometimes with an inherent 
pathogenicity. 

Thus, (i) various MSRV-l-related sequences have 
been found in purified viral particles; (ii) molecular 

30 analysis of the various regions of the MSRV-1 retroviral 
genome should be carried out by systematically analyzing 
the co-encapsidated, interfering and/or recombinant 
sequences which are generated by the infection and/or 
expression of MSRV-1; furthermore, some clones may have 

35 portions of defective sequences produced by the retroviral 
replication and the template and/or transcription errors 
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caused by reverse transcriptase; (iii) the families of 
sequences related to the same retroviral genomic region 
are the supports for an overall diagnostic detection which 
may be optimized by the identification of invariable 
5 regions among the clones expressed and by the 
identification of reading frames responsible for the 
production of antigenic and/or pathogenic polypeptides 
which may only be produced by a portion, or even only one, 
of the clones expressed and under these conditions, the 

10 systematic analysis of the clones expressed in one region 
of a given gene makes it possible to evaluate the 
frequency of variation and/or recombination of the MSRV-1 
genome in this region and to define the optimum sequences 
for the applications, in particular the diagnostic 

15 applications; (iv) the pathology caused by a retrovirus 
such as MRSV-1 may be a direct effect of its expression 
and of the proteins or peptides produced as a result, but 
also an effect of the activation, encapsidat ion, 
recombination of related or heterologous genomes and 

20 proteins or peptides produced as a result; thus, these 
genomes associated with the expression and/or infection by 
MSRV-1 are an integral part of the potential pathogenicity 
of this virus and therefore constitute diagnostic 
detection supports and particular therapeutic targets. 

25 Likewise, any agent which is associated with, or which is 
a cofactor for these interactions responsible for the 
pathogenicity in question, such as MSRV-2 or the gliotoxic 
factor described in the patent application published under 
the No. FR-2, 716, 198 , can participate in the development 

30 of an overall and very effective strategy for therapeutic 
diagnosis, prognosis, monitoring and/or integrated therapy 
for MS in particular, but also for any other disease 
associated with the same agents. 

In this context, a parallel discovery has been 

35 made in another autoimmune disease, rheumatoid arthritis 
(RA) , which has been described in the French patent 
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application filed under the No. 95 02960. This discovery 
shows that, by applying methodological approaches similar 
to those which were used in the studies by the applicant 
on MS, it has been possible to identify a retrovirus 
5 expressed in RA which shares the sequences described for 
MSRV-1 in MS and also the coexistence of an 
MSRV-2-associated sequence which is also described in MS. 
As regards MSRV-1, the sequences commonly detected in MS 
and RA relate to the pol and gag genes. On the basis of 

10 current knowledge, it is possible to combine the gag and 
pol sequences described with the MSRV-1 strains expressed 
in these two diseases. 

The present patent application has as its object 
various results, supplementary in relation to those 

15 already protected by the French patent applications: 

No. 92/04322 of 03.04.1992, published under No. 
2, 689, 519; 

No. 92/13447 of 03.11.1992, published under No. 
2, 689, 521; 

20 No. 92/13443 of 03.11.1992, published under No. 

2, 689, 520; 

No. 94/01529 of 04.02.1994, published under No. 
2, 715, 936; 

No. 94/01531 of 04.02.1994, published under No. 
25 2, 715, 939; 

No. 94/01530 of 04.02.1994, published under No. 
2, 715, 936; 

No. 94/01532 of 04.02.1994, published under No. 
2, 715, 937; ~ " ' 

30 No. 94/14322 of 24.11.1994, published under No. 

2, 727, 428; 

No. 94/15810 of 23.12.1994, published under No. 
2, 728, 585; 

and 

35 Patent Application WO 97/06260. 
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The present invention relates, first of all, to a 
nucleic material, which may consist of a retroviral 
material, in isolated or purified state, which may be 
understood or characterized in various ways: 
5 - it comprises a nucleotide sequence chosen from 

the group which consists of (i) the sequences SEQ ID NO: 
112, SEQ ID NO: 114, SEQ ID NO: 117, SEQ ID NO: 120, SEQ 
ID NO: 124, SEQ ID NO: 130, SEQ ID NO: 141 and SEQ ID NO: 
142; (ii) the sequences complementary to sequences (i); 

10 and (iii) the sequences equivalent to sequences (i) or 
(ii) , in particular the sequences having, for every series 
of 100 contiguous monomers, at least 50%, and 
preferentially at least 70% homology with sequences (i) or 
( ii ) respectively; 

15 - it encodes a polypeptide having, for every 

contiguous series of at least 30 amino acids, at least 
50%, and preferably at least 70% homology with a peptide 
sequence chosen from the group which consists of SEQ ID 
NO: 113, SEQ ID NO: 115, SEQ ID NO: 118, SEQ ID NO: 121, 

20 SEQ ID NO: 135 and SEQ ID NO: 137; 

- its pol gene comprises a nucleotide sequence 
identical or equivalent to a sequence chosen from the 
group which consists of SEQ ID NO: 112, SEQ ID NO: 124 and 
their complementary sequences; 

25 - the 5 1 end of its pol gene starts at nucleotide 

1419 of SEQ ID NO: 130; 

- its pol gene encodes a polypeptide having, for 
every contiguous series of at least 30 amino acids, at 
least 50%, and preferably at least 70% homology with the 

30 peptide sequence SEQ ID NO: 113; 

- the 3 ! end of its gag gene ends at nucleotide 
1418 of SEQ ID NO: 130; 

its env gene comprises a nucleotide sequence 
identical or equivalent to a sequence chosen from the 
35 group which consists of SEQ ID NO: 117, and its 
complementary sequences ; 
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- its env gene comprises a nucleotide sequence 
which starts at nucleotide 1 of SEQ ID NO: 117 and ends at 
nucleotide at nucleotide [sic] 233 of SEQ ID NO: 114; 

- its env gene encodes a polypeptide having, for 
5 every contiguous series of at least 30 amino acids, at 

least 50%, and preferably at least 70% homology with the 
sequence SEQ ID NO: 118; 

the U3R region of its 3 f LTR comprises a 
nucleotide sequence which ends at nucleotide 617 of SEQ ID 
10 NO: 114; 

- the RU5 region of its 5' LTR comprises a 
' nucleotide sequence which starts at nucleotide 755 of SEQ 

ID NO: 120 and ends at nucleotide 337 of SEQ ID NO: 141 or 
SEQ ID NO: 142; 

15 - a retroviral nucleic material comprising a 

sequence which starts at nucleotide 755 of SEQ ID NO: 120 
and which ends at nucleotide 617 of SEQ ID NO: 114; 

- the retroviral nucleic material as defined above 
is in particular associated with at least one autoimmune 

20 disease such as multiple sclerosis or rheumatoid 
arthritis . 

The invention also relates to a nucleotide 
fragment which corresponds to at least one of the 
following definitions : 

25 - it comprises or consists of a nucleotide 

sequence chosen from the group which consists of (i) 

the sequences SEQ ID NO: 112, SEQ ID NO: 114, SEQ 
ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, SEQ ID NO: 
130, SEQ ID NO: 141 and SEQ ID NO: 142; (ii) the sequences 

30 complementary to sequences (i) ; and (iii) the sequences 
equivalent to sequences (i) or (ii) , in particular the 
sequences having, for every series of 100 contiguous 
monomers, at least 50%, and preferentially at least 70% 
homology with sequences (i) or (ii) respectively; 

35 - it comprises or consists of a nucleotide 

sequence encoding a polypeptide having, for every 



contiguous series of at least 30 amino acids, at least 
50%, and preferably at least 70% homology with a peptide 
sequence chosen from the group which consists of SEQ ID 
NO: 113, SEQ ID NO: 115, SEQ ID NO: 118, SEQ ID NO: 121, 
5 SEQ ID NO: 135 and SEQ ID NO: 137. 

Other subjects of the present invention are the 
following : 

a nucleic probe for the detection of a 
retrovirus associated with multiple sclerosis and/or 
10 rheumatoid arthritis, capable of hybridizing specifically 
with any fragment defined above and belonging to the 
genome of said retrovirus; it advantageously possesses 
from 10 to 100 nucleotides, preferably from 10 to 30 
nucleotides; 

15 a primer for the amplification, by 

polymerization, of an RNA or of a DNA of a retrovirus 
associated with multiple sclerosis and/or rheumatoid 
arthritis, which comprises a nucleotide sequence identical 
or equivalent to at least a portion of the nucleotide 

20 sequence of a fragment defined above, in particular a 
nucleotide sequence having, for every series of 10 
contiguous monomers, at least 50%, preferably at least 70% 
homology with at least said portion of said fragment; 
preferably the nucleotide sequence of a primer of the 

25 invention is chosen from SEQ ID NO: 116, SEQ ID NO: 119, 
SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 126, SEQ ID NO: 
127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 132, and 
SEQ ID NO: 133; 

- an RNA or a DNA, and in particular a replication 
30 and/or expression vector, comprising a genomic fragment of 

the nucleic material or a fragment defined above; 

- a peptide encoded by any open reading frame 
belonging to a nucleotide fragment defined above, in 
particular a polypeptide, for example oligopeptide forming 

35 or comprising an antigenic determinant recognized by sera 
of patients infected with the MSRV-1 virus, and/or in whom 
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the MSRV-1 virus has been reactivated; a preferential 
peptide comprises a sequence identical, partially or 
completely, or equivalent to a sequence chosen from SEQ ID 
NO: 113, SEQ ID NO: 115, SEQ ID NO: 118, SEQ ID NO: 121, 
5 SEQ ID NO: 135 and SEQ ID NO: 137; 

a diagnostic, prophylactic or therapeutic 
composition, in particular for inhibiting the expression 
of at least one retrovirus associated with multiple 
sclerosis and/or rheumatoid arthritis, comprising a 
10 nucleotide fragment defined above; 

- a method for detecting a retrovirus associated 
with multiple sclerosis and/or rheumatoid arthritis, in a 
biological sample, comprising the steps consisting of 
bringing an RNA and/or a DNA assumed to belong to or 

15 obtained from said retrovirus, or their complementary RNA 
and/or DNA, into contact with a composition comprising a 
nucleotide fragment defined above. 

Before detailing the invention, various terms used 
in the description and the claims are now defined: 

20 - strain or isolate is understood to mean any 

infectious and/or * pathogenic biological fraction 
containing, for example, viruses and/or bacteria and/or 
parasites, generating a pathogenic and/or antigenic power, 
harbored by a culture or a live host; by way of example, a 

25 viral strain according to the preceding definition may 
contain a co-infect ipus agent, for example a pathogenic 
protist , 

- the term "MSRV" used in the present description 
designates any pathogenic and/or infectious agent, as 

30 associated with MS, in particular a viral species, the 
attenuated strains of said viral species, or the 
interfering defective particles or particles containing 
co-encapsidated * genomes or alternatively genomes 
recombined with a portion of the MSRV-1 genome, which are 

35 derived from this species.. It is known that viruses and 
particularly viruses containing RNA exhibit variability, 



following in particular relatively high rates of 
spontaneous mutation, which will be taken into account 
below to define the concept of equivalence, 

human virus is understood to mean a virus 
5 capable of infecting or of being harbored by human beings, 

- given all the natural or induced variations 
and/or recombination which may be encountered in practice 
in the present invention, the objects thereof, defined 
above and in the claims, have been expressed by comprising 

10 the equivalents or derivatives of the various biological 
materials defined below, in particular homologous 
nucleotide or peptide sequences, 

- the variant of a virus or of a pathogenic and/or 
infectious agent according to the invention comprises at 

15 least one antigen recognized by at least one antibody 
directed against at least one corresponding antigen of 
said virus and/or of said pathogenic and/or infectious 
agent, and/or a genome in which any portion is detected by 
at least one hybridization probe, and/or at least one 

20 nucleotide amplification primer specific for said virus 
and/or pathogenic and/or infectious agent, under defined 
hybridization conditions well known to persons skilled in 
the art, 

- according to the invention, a nucleotide 
25 fragment or an oligonucleotide or a polynucleotide is a 

stretch of monomers, or a biopolymer, characterized by the 
informational sequence of the natural nucleic acids, which 
is capable of hybridizing to any other nucleotide fragment 
under predefined conditions, it being possible for the 

30 stretch to contain monomers of different chemical 
structures and to be obtained from a natural nucleic acid 
molecule and/or by genetic recombination and/or by 
chemical synthesis; a nucleotide fragment may be identical 
to a genomic fragment of the MSRV-1 virus considered by 

35 the present invention, in particular a gene of the latter, 
for example pol or env in the case of said virus; 
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- thus, a monomer may be a natural nucleic acid 
nucleotide in which the constituent components are a 
sugar, a phosphate group and a nitrogen base; in RNA, the 
sugar is ribose; in DNA, the sugar is 2-deoxyribose; 

5 depending on whether DNA or RNA is involved, the nitrogen 
base is chosen from adenine, guanine, uracil, cytosine, 
thymine; or the nucleotide may be modified in at least one 
of the three constituent components; by way of example, 
the modification may occur at the level of the bases, 

10 generating modified bases such as inosine, 

5 -methyl -deoxycyt idine , deoxyuridine, 
5-dimethylamineodeoxyuridine [sic] ,. 2 , 6-diamineopurine 
[sic] , 5-bromodeoxyuridine and any other modified base 
promoting hybridization; at the level of. the sugar, the 

15 modification may consist in the replacement of at least 
one deoxyribose with a polyamide, and at the level of the 
phosphate group, the modification may consist in its 
replacement with esters, in particular chosen from the 
esters of diphosphate, of alkyl and arylphosphonate and of 

20 phosphor othioate, 

- "informational sequence" is understood to mean 
any ordered series of monomers, whose chemical nature and 
in which the order in a reference direction, constitute or 
otherwise a functional information of the same quality as 

25 that for the natural nucleic acids, 

- hybridization is understood to mean the process 
during which, under appropriate operating conditions, two 
nucleotide fragments, having sufficiently complementary 
sequences, become annealed to form a complex, in 

30 particular a double or triple, structure, preferably in 

helical form, 

a probe comprises a nucleotide fragment 

synthesized by the chemical route or obtained by digestion 

or enzymatic cleavage of a longer nucleotide fragment, 
35 comprising at least six monomers, advantageously from 10 

to 100 monomers, preferably 10 to 30 monomers, and 
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possessing a hybridization specificity under defined 
conditions; preferably, a probe possessing less than 10 
monomers is not used alone, but is used in the presence of 
other probes which are equally short in length or 
5 otherwise; under certain specific conditions, it may be 
useful to use probes which are greater than 100 monomers 
in size; a probe may be used in particular for diagnostic 
purposes, and it may be, for example, capture and/or 
detection probes, 
10 - the capture probe may be immobilized on a solid 

support by any appropriate means, that is to say directly 
or indirectly, for example by covalent bonding or passive 
adsorption, 

- the detection probe may be labeled by means of a 
15 marker chosen in particular from radioactive isotopes, 

enzymes chosen in particular from peroxidase and alkaline 
phosphatase and those capable of hydrolyzing a 
chromogenic, fluorigenic or luminescent substrate, 
chromophoric chemical compounds, chromogenic, fluorigenic 
20 or luminescent compounds, analogs of nucleotide bases, and 
biotin, 

- the probes used for diagnostic purposes of the 
invention may be used in all known hybridization 
techniques, and in particular the so-called "DOT-BLOT" 

25 technique, "SOUTHERN BLOT" technique, "NORTHERN BLOT" 
technique which is a technique, identical to the "SOUTHERN 
BLOT" technique but which uses RNA as target, the SANDWICH 
technique; advantageously, the SANDWICH technique is used 
in the present invention, comprising a specific capture 

30 probe and/or a specific detection probe, it being 
understood that the capture probe and the detection probe 
must have a nucleotide sequence which is at least 
partially different , 

- any probe according to the present invention may 
35 hybridize in vivo or in vitro with the RNA and/or with the 

DNA, in order to block the replication, in particular 
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translation and/or transcription, phenomena and/or to 
degrade said DNA and/or RNA, 

- a primer is a probe comprising at least six 
monomers, and advantageously from 10 to 30 monomers, 

5 possessing hybridization specificity under defined 
conditions, for the initiation of an enzymatic 
polymerization, for example in an amplification technique 
such as PCR (Polymerase Chain Reaction) , in an extension 
method such as sequencing, in a reverse transcription 
10 method and the like, 

- two nucleotide or peptide sequences are said to 
be equivalent or derived with respect to each other, or 
with respect to a reference sequence, if functionally the 
corresponding biopolymers can play substantially the same 

15 role, without being identical, in relation to the 
application or use considered, or in the technique in 
which they are involved; particularly equivalent are. two 
sequences obtained because of the natural variability, in 
particular spontaneous mutation, of the species from which 

20 they were identified, or induced mutation, as well as two 

homologous sequences, the homology being defined 

below, 

- "variability" is understood to mean any 
spontaneous or induced modification of a sequence, in 

25 particular by substitution, and/or insertion, and/or 
deletion of nucleotides and/or of nucleotide fragments, 
and/or extension and/or shortening of the sequence at 
least at one of the ends; a nonnatural variability may 
result from the genetic engineering techniques used, for 

30 example from the choice of the degenerate or nondegenerate 
synthetic primers selected to amplify a nucleic acid; this 
variability may result in modifications of any starting 
sequence, considered as a reference, and which may be 
expressed by a degree of homology with respect to said 

35 reference sequence, 
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- homology characterizes the degree of identity of 
two compared nucleotide or peptide fragments; it is 
measured by the percentage identity which is in particular 
determined by direct comparison of nucleotide or peptide 
sequences, with respect to reference nucleotide or peptide 
sequences, 

- any nucleotide fragment is said to be equivalent 
to or derived from a reference fragment if it has a 
nucleotide sequence equivalent to the sequence of the 
reference fragment; according to the preceding definition, 
in particular equivalent to a reference nucleotide 
fragment are: 

(a) any fragment capable of hybridizing, at least 
partially, with the complementary to the reference 
fragment , 

(b) any fragment whose alignment with the 
reference fragment leads to the identification of 
identical contiguous bases, in a greater number than with 
any other fragment obtained from another taxonomic group, 

(c) any fragment resulting or capable of resulting 
from the natural variability of the species from which it 
is obtained, 

(d) any fragment which may result from genetic 
engineering techniques applied to the reference fragment, 

(e) any fragment, containing at least eight 
contiguous nucleotides, encoding a peptide homologous or 
identical to the peptide encoded by the reference 
fragment , 

(f) any fragment different from the reference 
fragment through insertion, deletion, substitution of at 
least one monomer, extension, or shortening at least at 
one of its ends; for example, any fragment corresponding 
to the reference fragment, flanked at least at one of its 
ends by a nucleotide sequence not encoding a polypeptide, 

- polypeptide is understood to mean in particular 
any peptide of at least two amino acids, in particular 
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oligopeptide, protein, extracted, separated, or 

substantially isolated or synthesized, through the 
involvement of humans, in particular those obtained by 
chemical synthesis, or through expression in a recombinant 
5 organism, 

- polypeptide partially encoded by a nucleotide 
fragment is understood to mean a polypeptide having at 
least three amino acids encoded by at least nine 
contiguous monomers included in said nucleotide fragment, 

10 - an amino acid is said to be analogous to another 

amino acid when their respective physicochemical 
characteristics, such as polarity, hydrophobicity and/or 
basicity, and/or acidity, and/or neutrality, are 
substantially the same; thus, a leucine is analogous to an 

15 isoleucine, 

- any polypeptide is said to be equivalent to or 
derived from a reference polypeptide if the polypeptides 
compared have substantially the same properties, and in 
particular the same antigenic, immunological, enzymatic 

20 and/or molecular recognition properties; in particular 
equivalent to a reference polypeptide is: 

(a) any polypeptide possessing a sequence in which 
at least one amino acid has been replaced by an analogous 
amino acid, 

25 (b) any polypeptide having an equivalent peptide 

sequence, obtained by natural or induced variation of said 
reference polypeptide, and/or of the nucleotide fragment 
encoding said polypeptide, 

(c) a mimotope of said reference polypeptide, 

30 (d) any polypeptide from whose sequence one or 

more amino acids of the L series are replaced by an amino 
acid of the D series, and vice versa, 

(e) any polypeptide into whose sequence a 
modification of the side chains of the amino acids has 

35 been introduced, such as for example an acetylation of the 
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amine-containing functions, a carboxylation of the thiol 
functions, an esterif ication of the carboxyl functions, 

(f) any polypeptide in whose sequence one or more 
peptide bonds have been modified, such as for example the 

5 carba, retro, inverso, retro-inverso, reduced, and 
methylene-oxy bonds , 

(g) any polypeptide in which at least one antigen 
is recognized by an antibody directed against a reference 
polypeptide, 

10 - the percentage identity characterizing the 

homology between two peptide fragments compared is 
according to the present invention at least 50% and 
preferably at least 70%. 

Given that a virus possessing a reverse 

15 transcriptase enzymatic activity may be genetically 
characterized both in RNA and DNA form, both the viral DNA 
and RNA will be mentioned in order to characterize the 
sequences relative to a virus possessing such a reverse 
transcriptase activity, termed MSRV-1 according to the 

20 present description. 

The expressions of order which are used in the 
present description and the claims, such as "first 
nucleotide sequence", are not selected to express a 
particular order, but to define the invention more 

25 clearly. 

Detection of a substance or agent is understood 
below to mean an identification, a quantification or a 
separation or isolation of said substance or of said 
agent. 

30 The invention will be understood more clearly on 

reading the detailed description which follows which is 
made with reference to the appended figures in which: 

Figure 1 represents the general structure of the 
proviral DNA and the genomic RNA of MSRV-1 . 

35 Figure 2 represents the nucleotide sequence of the 

clone called CL6-5 T (SEQ ID NO: 112) and three potential 



reading frames in amino acids presented under the 
nucleotide sequence . 

Figure 3 represents the nucleotide sequence of the 
clone called CL6-3' (SEQ ID NO: 114) and three potential 
5 reading frames in amino acids presented under the 
nucleotide sequence. 

Figure 4 represents the nucleotide sequence of the 
clone called C15 (SEQ ID NO: 117) and three potential 
reading frames in amino acids presented under the 
10 nucleotide sequence. 

Figure 5 represents the nucleotide sequence of the 
clone called 5M6 (SEQ ID NO: 120) and three potential 
reading frames in amino acids presented under the 
nucleotide sequence . 
15 Figure 6 represents the nucleotide sequence of the 

clone called CL2 (SEQ ID NO: 130) and three potential 
reading frames in amino acids presented under the 
nucleotide sequence . 

Figure 7 represents three potential reading frames 
20 in amino acids expressed by pET28C-clone 2 and presented 
under the nucleotide sequence. 

Figure 8 represents three potential reading frames 
in amino acids expressed by pET21C-clone 2 and presented 
under the nucleotide sequence. 
25 Figure 9 represents the nucleotide sequence of the 

clone called LB13 (SEQ ID NO: 141) and three potential 
reading frames in amino acids presented under the 
nucleotide sequence . 

Figure 10 represents the nucleotide sequence of 
30 the clone called LA15 (SEQ ID NO: 142) and three potential 
reading frames in amino acids presented under the 
nucleotide sequence . 

Figure 11 represents the nucleotide sequence of 
the clone called LB16 (SEQ ID NO: 124) and three potential 
35 reading frames in amino acids presented under the 
nucleotide sequence . 
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EXAMPT/R 1: PREPARATION OF A CL6-5 1 REGION ENCODING 
THE N-TERMINAL END OF INTEGRASE AND OF A CL6-3 1 REGION 
CONTAINING THE 3' TERMINAL SEQUENCE OF THE MSRV-1 GENOME 
5 A3 1 RACE was carried out on the total RNA 

extracted from plasma from a patient suffering from MS. A 
healthy control plasma, treated under the same conditions, 
was used as negative control. The synthesis of cDNA was 
carried out with an oligo dT primer identified by SEQ ID 
10 NO: 68 (5 ! GAC TCG CTG CAG ATC GAT TTT TTT TTT TTT TTT T 
3 1 ) and the reverse transcriptase "Expand™ RT" from 
Boehringer according to the conditions recommended by the 
company. A PCR was carried out with the enzyme Klentaq 
(Clontech) under the following conditions: 94 °C 5 min then 
15 93°C 1 min, 58°C 1 min, 68°C 3 min over 40 cycles and 68°C 
for 8 min, with a final reaction volume of 50 jal . 
Primers used for the PCR: 

- 5 1 primer, identified by SEQ ID NO: 69 

5 ■ GCC ATC AAG CCA CCC AAG AAC TCT TAA CTT 3 1 ; 
20 - 3' primer, identified by SEQ ID NO: 68 

A second so-called "seminested" PCR was carried 
out with a 5 1 primer situated inside the region already 
amplified. This second PCR was carried out under the same 
experimental conditions as those used for the first PCR, 
25 using 10 gl of the amplification product derived from the 
first PCR. 

Primers used for the seminested PCR: 

- 5 f primer, identified by SEQ ID NO: 70 

5 ? CCA ATA GCC AGA CCA TTA TAT ACA CTA ATT 3 1 ; 
30 - 3' primer, identified by SEQ ID NO: 68 

The primers SEQ ID NO: 69 and SEQ ID NO: 70 are 
specific for the pol region of MRSV-1. 

An amplification product of 1.9 Kb was obtained 
for the plasma of the MS patient. The corresponding 
35 fragment was not observed for the healthy control plasma. 



This amplification product was cloned in the following 
manner : 

The amplified DNA was inserted into a plasmid with 
the aid of the TA Cloning kit®. The 2 \il of DNA solution 
5 were mixed with 5 of sterile distilled water, 1 pi of a 
10 times concentrated ligation buffer "10X LIGATION 
BUFFER", 2 pi of "pCR™ VECTOR" (25 ng/ml) and 1 \il of "T4 
DNA LIGASE". This mixture was incubated overnight at 12 °C. 
The .next steps were carried out in accordance with the 

10 instructions for the TA Cloning kite (Invitrogen) . At the 
end of the procedure, the white colonies of recombinant 
bacteria (white) were subcultured so as to be cultured and 
allow the extraction of the plasmids incorporated 
according to the so-called "miniprep" procedure. The 

15 plasmid preparation of each recombinant colony was cut 
with an appropriate restriction enzyme and analyzed on 
agarose gel. The plasmids possessing an insert detected 
under UV light after staining the gel with ethidium 
bromide were selected for the sequencing of the insert 

20 after hybridization with a primer complementary to the Sp6 
promoter present on the cloning plasmid of the TA cloning 
kit®. The reaction prior to the sequencing was then carried 
out according to the method recommended for using the 
sequencing kit "PRISM™ Ready Reaction AmpliTaq 0 FS, 

25 DyeDeoxy™ Terminator" (Applied Biosystems, ref. 402119) 
and the automated sequencing was carried out on the 
Applied Biosystems 373 A and 377 apparatus, according to 
the manufacturer's instructions. 

The clone obtained contains a CL6-5* region 

30 encoding the N-terminal end of integrase and a CL6-3 1 
region corresponding to the 3 1 terminal region of MSRV-1 
and making it possible to define the end of the envelope 
(234 bp) and the U3 and R (401 bp) regions of the MSRV1 
retrovirus . 

35 The region corresponding to the N-terminal end of 

integrase is represented by its nucleotide sequence (SEQ 
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ID NO: 112) in Figure 27. The three potential reading 
frames are presented by their amineo [sic] acid sequence 
under the nucleotide sequence, and the amineo [sic] acid 
sequence of the N-terminal end of integrase is identified 
5 by SEQ ID NO: 113. 

The C16-3' region is represented by its nucleotide 
sequence (SEQ ID NO: 114) in Figure 3. The three potential 
reading frames are presented by their amineo [sic] acid 
sequence under the nucleotide sequence. An amineo [sic] 

10 acid sequence corresponding to the C-terminal end of the 
MSRV-1 env protein is identified by SEQ ID NO: 115. 

In order to evaluate the promoter activity of the 
LTR obtained from clone 6 (cl6) , a test of promoter 
activity using the enzyme CAT (chloramphenicol acetyl 

15 transferase) was carried out with the corresponding U3R 
region. In parallel, a clone containing the same U3R 
region of endogenous retroviral RNA expressed in normal 
placenta (PH74) and a clone (5M6) obtained from DNA were 
tested. The result presented in Figure 12 shows a very 

20 high promoter activity of the LTR derived from MS plasma 
(cl6) and a significantly much lower activity with the 
sequences of non-MS endogenous origin. 

EXAMPLE 2 : PREPARATION OF THE CIS CLONE CONTAINING 
25 THE REGION ENCODING A PORTION OF THE MSRV-1 RETROVIRUS 
ENVELOPE 

A RT-PCR was carried out on the total RNA 
extracted from virions concentrated by ultracentrif ugat ion 
of a synoviocyte culture supernatant obtained from an MS 

30 patient. The synthesis of cDNA was carried out with an 
oligo dT primer and the reverse transcriptase "Expand™ RT" 
from Boehringer according to the conditions recommended by 
the company. A PCR was carried out with the Expand™ Long 
Template PCR System (Boehringer) under the following 

35 conditions: 94°C 5 min then 93°C 1 min, 60°C 1 min, 68°C 3 
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min over 40 cycles and 68 °C for 8 min and with a final 
reaction volume of 50 jjI. 

Primers used for the PCR: 

- 5' primer, identified by SEQ ID NO: 69 

5 5 ' GCC ATC AAG CCA CCC AAG AAC TCT TAA CTT 3 ' ; 

- 3 1 primer, identified by SEQ ID NO: 116 

5 1 TGG GGT TCC ATT TGT AAG ACC ATC TGT AGC TT 3 ' 
A second so-called "seminested" PCR was carried 
out with a 5 1 primer situated inside the region already 
10 amplified. This second PCR was carried out under the same 
experimental conditions as those used for the first PCR 
(except that 30 cycles were used instead of 40), using 10 
pi of the amplification product derived from the first 
PCR. 

15 Primers used for the seminested PCR: 

- 5' primer, identified by SEQ ID NO: 70 

5 ' CCA ATA GCC AGA CCA TTA TAT ACA CTA ATT 3 ' ; 

- 3' primer, identified by SEQ ID NO: 116 

The primers SEQ ID NO: 69 and SEQ ID NO: 70 are 

20 specific for the pol region of MRSV-1. The primer SEQ ID 
NO: 116 is specific for the sequence FBdl3 (also called 
B13) and is located in the conserved env region among the 
oncoretroviruses . 

An amplfication product of 1932 by was obtained 

25 and cloned in the following manner: 

the amplified DNA was inserted into a plasmid with 
the aid of the TA Cloning kit®. The various steps were 
carried out in accordance with the instructions for the TA 
Cloning kit® (Invitrogen) . At the end of the procedure, the 

30 white colonies of recombinant bacteria (white) were 
subcultured so as to be cultured and allow the extraction 
of the plasmids incorporated according to the so-called 
"miniprep" procedure. The plasmid preparation of each 
recombinant colony was cut with an appropriate restriction 

35 enzyme and analyzed on agarose gel. The plasmids 
possessing an insert detected under UV light after 



22 



staining the gel with ethidium bromide were selected for 
the sequencing of the insert after hybridization with a 
primer complementary to the SP6 promoter present on the 
cloning plasmid of the TA cloning kit®. The reaction prior 
5 to the sequencing was then carried out according to the 
method recommended for using the sequencing kit "PRISMTM 
Ready Reaction AmpliTaqR FS, DyeDeoxy™ Terminator" 
(Applied Biosystems, ref. 402119) and the automated 
sequencing was carried out on the Applied Biosystems 373 A 
10 and 377 apparatus, according to the manufacturer's 
instructions . 

The C15 clone obtained contains a region 
corresponding to the region of the MSRV-1 envelope of 1481 
bp . 

15 The env region of the CIS clone is represented by 

its nucleotide sequence (SEQ ID NO: 117) in Figure 5. The 
three potential reading frames of this clone are presented 
by their amineo [sic] acid sequence under the nucleotide 
sequence. The reading frame corresponding to an MSRV-1 

20 structural env protein is identified by SEQ ID NO: 118. 

From the defined sequences obtained from clones 
cl6 and CIS, it was possible to produce a plasmid 
construct encoding a complete envelope followed by the 3 1 
LTR, as presented in Figure 13 with the corresponding 

25 reading frame. 

EXAMPLE 3 : PREPARATION OF A 5M6 CLONE CONTAINING 
THE SEQUENCES OF THE 3 f TERMINAL REGION OF THE ENVELOPE, 
FOLLOWED BY THE MSRV-1 PROVIRAL TYPE U3, R .AND US 

30 SEQUENCES 

A monodirectional PCR was carried out on the DNA 
extracted from immortalized B lymphocytes in culture from 
an MS patient. The PCR was carried out with Expand TM Long 
Template PCR System (Boehringer) under the following 

35 conditions: 94°C 3 min then 93°C 1 min, 60°C 1 min, 68°C 3 
min over 10 cycles, then 93°C 1 min, 60°C 1 min with 15 



sec of extension at each cycle, 68°C 3 min over 35 cycles 
and 68 °C for 7 min and with a final reaction volume of 50 
pi. 

The primer used for the PCR identified by SEQ ID 
5 NO: 119 is 5' TCA AAA TCG AAG AGC TTT AGA CTT GCT AAC CG 
3' ; 

The primers [sic] SEQ ID NO: 119 is specific for 
the env region of the C15 clone. 

An amplification product of 167 3 by was obtained 

10 and cloned in the following manner: 

the amplified DNA was inserted into a plasmid with 
the aid of the TA Cloning kit®. The various steps were 
carried out in accordance with the instructions for the TA 
Cloning kit® ( Invitrogen) . At the end of the procedure, the 

15 white colonies of recombinant bacteria (white) were 
subcultured so as to be cultured and allow the extraction 
of the plasmids incorporated according to the so-called 
"miniprep" procedure. The plasmid preparation of each 
recombinant colony was cut with an appropriate restriction 

20 enzyme and analyzed on agarose gel. The plasmids 
possessing an insert detected under UV light after 
staining the gel with ethidium bromide were selected for 
the sequencing of the insert after hybridization with a 
primer complementary to the T7 promoter present on the 

25 cloning plasmid of the TA cloning kit®. The reaction prior 
to the sequencing was then carried out according to the 
method recommended for using the sequencing kit "PRISMTM 
Ready Reaction AmpliTaq 0 FS, DyeDeoxy™ Terminator" 
(Applied Biosystems, ref . 402119) and the automated 

30 sequencing was carried out on the Applied Biosystems 373 A 
and 377 apparatus, according to the manufacturer's 
instructions . 

The 5M6 clone obtained contains a region 
corresponding to the 3 1 region of the MSRV-1 envelope of 
35 492 by followed by the regions U3, R and U5 (837 bp) of 
MSRV1 . 
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The 5M6 clone is represented by its nucleotide 
sequence (SEQ ID NO: 120) in Figure 5. The three potential 
reading frames of this clone are presented by their amineo 
[sic] acid sequence under the nucleotide sequence. The 
5 reading frame corresponding to the C-terminal end of the 
MSRV-1 env protein is identified by SEQ ID NO: 121. 



EXAMPLE 4 : PREPARATION OF THE LB16 CLONE 
CONTAINING THE REGION ENCODING THE MSRV-1 RETROVIRUS 
10 INTEGRASE 

An RT-PCR was carried out on the total RNA treated 
with DNAsel and extracted from a choroid plexus obtained 
from an MS patient. The synthesis of cDNA was carried out 
with an oligo dT primer and the reverse transcriptase 

15 "Expand TM RT" from Boehringer according to the conditions 
recommended by the company. A "no RT" control was carried 
out in 1 parallel on the same material. A PCR was carried 
out with Taq polymerase (Perkin Elmer) under the following 
conditions: 95°C 5 min, then 95°C 1 min, 55°C 1 min, 72°C 

20 2 min over 35 cycles and 72°C for 8 min and with a final 
reaction volume of 50 ]il . 

Primers used for the PCR: 

- 5' primer, identified by SEQ ID NO: 122 
5 ' GGC ATT GAT AGC ACC CAT CAG 3 ' ; 
25 - 3 1 primer, identified by SEQ ID NO: 123 

5 1 CAT GTC ACC AGG GTG GAA TAG 3 ' 

The primer SEQ ID NO: 122 is specific for the pol 
region of MSRV-1 and more precisely similar to the 
integrase region described above. The primer SEQ ID NO 123 

30 was defined on sequences of the clones obtained during 
preliminary tests . 

An amplification product of about 760 bp was 
obtained only in the test with RT and was cloned in the 
following manner: 

35 the amplified DNA was inserted into a plasmid with 

the aid of the TA Cloning kit®. The various steps were 
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carried out in accordance with the instructions for the TA 
Cloning kit® (Invitrogen) . At the end of the procedure, the 
white colonies of recombinant bacteria (white) were 
subcultured so as to be cultured and allow the extraction 
of the plasmids incorporated according to the so-called 
"miniprep" procedure. The plasmid preparation of each 
recombinant colony was cut with an appropriate restriction 
enzyme and analyzed on agarose gel. The plasmids 
possessing an insert detected under UV light after 
staining the gel with ethidium bromide were selected for 
the sequencing of the insert after hybridization with a 
primer complementary to the T7 promoter present on the 
cloning plasmid of the TA cloning kit®. The reaction prior 
to the sequencing was then carried out according to the 
method recommended for using the sequencing kit " PRISMTM 
Ready Reaction AmpliTaqR FS, DyeDeoxy™ Terminator" 
(Applied Biosystems, ref. 402119) and the automated 
sequencing was carried out on the Applied Biosystems 373 A 
and 377 apparatus, according to the manufacturer's 
instructions . 

The LB16 clone obtained contains the sequences 
corresponding to integrase. The nucleotide sequence of 
this clone was identified by SEQ ID NO: 124 in Figure 11, 
three reading frames are determined. 

EXAMPLE 5 : PREPARATION OF A CLONE 2, CL2 , 
CONTAINING IN 3 ' A PORTION HOMOLOGOUS TO THE POL GENE, 
CORRESPONDING TO THE PROTEASE GENE, AND TO THE GAG GENE 
(GM3) CORRESPONDING TO THE NUCLEOCAPSID, AND A NEW 
5' CODING REGION, CORRESPONDING TO THE GAG GENE MORE 
SPECIFICALLY THE TEMPLATE AND THE CAPSID of MSRV-1 . 

A PCR amplification was carried out on the total 
RNA extracted from 100 jjl of plasma from a patient 
suffering from MS. A water control, treated under the same 
conditions, was used as negative control. The synthesis of 
cDNA was carried out with 300 pmol of a random primer 
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(GIBCO-BRL, France) and the reverse transcriptase "Expand 
RT" (BOEHRINGER MANNHEIM, France) according to the 
conditions recommended by the company. An amplification by 
PCR ("polymerase chain reaction") was carried out with the 
enzyme Taq polymerase (Perkin Elmer, France) using 10 pi 
of cDNA under the following conditions: 94°C 2 min, 55°C 1 
min, 72°C 2 min then 94°C 1 min, 55°C 1 min, 72°C 2 min 
over 30 cycles and 72°C for 7 min with a final reaction 
volume of 50 pi. 

Primers used for the PCR amplification: 

- 5 f primer, identified by SEQ ID NO: 126 
5 ' CGG ACA TCC AAA GTG ATG GGA AAC G 3 f ; 

- 3' primer, identified by SEQ ID NO: 127 
5 ' GGA CAG GAA AGT AAG ACT GAG AAG GC 3 ' 

A second amplification by so-called "seminested" 
PCR was carried out with a 5' primer situated inside the 
region already amplified. This second PCR was carried out 
under the same experimental conditions as those used 
during the first PCR, using 10 pi of the amplification 
product derived from the first PCR. 

Primers used for the amplification by seminested 

PCR: 

- 5 1 primer, identified by SEQ ID NO: 128 
5 ' CCT AGA ACG TAT TCT GGA GAA TTG GG 3 ' ; 
3 f primer, identified by SEQ ID NO: 129 

5 ' TGG CTC TCA ATG GTC AAA CAT ACC CG 3 ' 
The primers SEQ ID NO: [lacuna] .and SEQ ID NO: 
[lacuna] are specific for the pol region, clone G+E+A, 
more specifically the E region: nucleotide position No. 
423 to No. 448. The primers used in the 5 f region were 
defined on sequences of clones obtained during preliminary 
tests. 

An amplification product of 1511 bp was obtained 
from the RNA extracted from the plasma of an MS patient. 
The corresponding fragment was not observed for the water 
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control. This amplification product was cloned in the 
following manner. 

The amplified DNA was inserted into a plasmid with 
the aid of the TA Cloning kit™. The 2 jil of DNA solution 
5 were mixed with 5 -tl of sterile distilled water,. 1 pi of 
a 10 times concentrated ligation buffer "10X LIGATION 
BUFFER", 2 \il of "pCR™ VECTOR" (25 ng/ml)and 1 pi of "T4 
DNA LIGASE" . This mixture was incubated overnight at 14 °C. 
The following steps were carried out in accordance with 

10 the instructions of the TA Cloning kit® (Invitrogen) . The 
mixture was plated after transformation of the ligation 
into E. coli INVaF 1 bacteria. At the end of the procedure, 
the white colonies of recombinant bacteria were 
subcultured so as to be cultured and allow the extraction 

15 of the plasmids incorporated according to the so-called 
"DNA minipreparation" procedure (17). The plasmid 
preparation of each recombinant colony was cut with the 
restriction enzyme EcoRI and analyzed on agarose gel. The 
plasmids possessing an insert detected under UV light 

20 after staining the gel with ethidium bromide were selected 
for the sequencing of the insert after hybridization with 
a primer domplementary to the T7 promoter present on the 
cloning plasmid of the TA cloning kit®. The reaction prior 
to the sequencing was then carried out according to the 

25 method recommended for using the sequencing kit " PRISM 
Ready Reaction Amplitaq® FS, DyeDeoxy™ Terminator" 
(Applied Biosystems, ref. 402119) and the automated 
sequencing was carried out on the Applied Biosystems 373 A 
and 377 apparatus, according to the manufacturer's 

30 instructions. 

The clone obtained, called CL2 , contains a C- 
terminal region' similar to the 5 f terminal region of the 
clones G+E+A of MSRV-1, which makes it possible to define 
the C-terminal region of the gag gene and a new region 

35 corresponding to the N-terminal region of the MSRV-1 gag 
gene . 
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CL2 makes it possible to define a region of 1511 
by having an open reading frame in the N-terminal region 
of 1077 bp encoding 359 amino acids and a nonopen reading 
frame of 4 54 bp corresponding to the C-terminal region of 
the MSRV-1 gag gene. 

The nucleotide sequence of CL2 is identified by 
SEQ ID NO: 130. It is represented in Figure 6 with the 
potential reading frames in amineo [sic] acid. 

The 1077 bp fragment of CL2 encoding 359 amino 
acids was amplified by PCR with the Pwo enzyme (SU/pl) 
(Boehringer Mannheim, France) using 1 pi of the DNA 
minipreparation of clone 2 under the following conditions: 
95°C 1 min, 60°C 1 min, 72°C 2 min over 25 cycles and with 
a final reaction volume of 50 ]il with the aid of the 
primers : 

5 f primer (BamHI) , identified by SEQ ID NO: 132 
5 f TGC TGG AAT TCG GGA TCC TAG AAC GTA TTC 3' (30 
mer) , and 

3' primer (Hindlll) , identified by SEQ ID NO: 133 

5 AGT TCT GCT CCG AAG CTT AGG CAG ACT TTT 3' (30 
mer) corresponding,. respectively, to the nucleotide 
sequence of clone 2 at position -9 to 21 and 1066 to 1095. 

The fragment obtained by PCR was linearized with 
BamHI and Hindlll and subcloned into the expression 
vectors pET28C and pET21C (NOVAGEN) linearized with BamHI 
and Hindlll. The sequencing of the DNA of the 1077 bp 
fragment of clone 2 in the two expression vectors was 
carried out according to the method recommended for the 
use of the sequencing kit "PRISM™ Ready Reaction Amplitaqe® 
FS, DyeDeoxy™ Terminator" (Applied Biosystems, ref. 402119 
and the automated sequencing was carried out on the 
Applied Biosystems 373 A and 377 apparatus, according to 
the manufacturer's instructions. 

The expression of the nucleotide sequence of the 
1077 bp fragment of clone 2 by the expression vectors 
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pET28C and pET21C are identified by SEQ ID NO: 135 and SEQ 
ID NO: 137, respectively. 

EXAMPLE 6 : EXPRESSION OF CLONE 2 IN ESCHERICHIA 

5 COL I 

The constructs pET28c-clone 2 (1077 bp) and 
pET21C-clone 2 (1077 bp) synthesize, in the bacterial 
strain BL21 (DE3) , a protein fused at the N- and Cterminus 
for the vector pET28C and the C-terminus for the vector 

10 pET21C with 6 Histidines, having an apparent molecular 
mass of about 45 kDa, identified by SDS-PAGE 
polyacrylamide gel electrophoresis (SDS = Sodium Dodecyl 
Sulfate) (Laemmli, 1970 (1) ) . The reactivity of the 
protein was demonstrated towards an anti-Histidine 

15 monoclonal antibody (DIANOVA) by the Western-blot 
technique (Towbin et al., 1979 (2)). 

The recombinant proteins pET28c-clone 2 (1077 bp) 
and pET21C-clone 2 (1077 bp) were visualized by SDS-PAGE 
in the insoluble fraction after enzymatic digestion of the 

20 bacterial extracts with 50 gl of lysozyme (10 mg/ml) and 
ultrasound lysis. 

The antigenic properties of the recombinant 
antigens pET28C-clone 2 (1077 bp) and pET21C-clone 2 (1077 
bp) were tested by Westen blotting () [sic] after 

25 solubilization of the bacterial pellet with 2% SDS and 50 
mM P-mercaptoethanol . After incubation with sera from 
patients suffering from multiple sclerosis, the sera from 
neurological controls and the sera from controls at the 
Blood Transfusion Center (CTS) , the immunocomplexes were 

30 detected with the aid of an alkaline phosphatase-coupled 
goat serum anti-human IgG and anti-human IgM. 

The results are presented in the table below. 

TABLE 

Reactivity of sera affected by multiple sclerosis and 
35 controls with the MSRV-1 recombinant protein gag clone 2 
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(1077 bp) = pET21C-clone 2 (1077 bp) and pET28C-clone 2 

(1077 bp) a 
NUMBER OF NUMBER OF POSITIVE 



DISEASE 



MS 

NEUROLOGICAL 
CONTROLS 
HEALTHY 
CONTROLS (CTS) 



INDIVIDUALS 
TESTED 

15 
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INDIVIDUALS 
6 

2 (+ + +), 2 (++), (2 ( + ; 
1 (+ + +) 
1 (+/-) 



(a) The strips containing 1.5 jag of recombinant antigen 
pET-gag clone 2 (1077 bp) exhibit reactivity against sera 

15 diluted 1/100. The Western-Blot interpretation is based on 
the presence or absence of a specific pET-gag clone 2 
(1077 bp) band on the strips. Positive and negative 
controls are included in each experiment. 

These results show that, under the technical 

20 conditions used, about 40% of the human sera affected by 
multiple sclerosis which were tested react with the 
recombinant proteins pET28C-clone 2 (1077 bp) and 
pET21C-clone 2 (1077 bp). Reactivity was observed on a 
neurological control and it is of interest to note that 

25 the RNAs extracted from this serum, after the reverse 
transcriptase step, are also amplified by PCR in the pol 
region. This suggests that people who have not declared MS 
may also harbor and express this virus. On the other hand, 
an apparently healthy control (CTS donor) possesses 

30 anti-gag (clone 2, 1077 bp) antibodies. This is compatible 
with an immunity acquired against MSRV-1 independently of 
a declared associated autoimmune disease. 



35 



EXAMPLE 7 : PREPARATION OF AN LB13 CLONE CONTAINING IN 3 1 A 
PORTION HOMOLOGOUS TO CLONE 2 CORRESPONDING TO THE GAG 
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GENE AND IN 5 1 A PORTION HOMOLOGOUS TO THE 5M6 CLONE 
CORRESPONDING TO THE U5 LTR REGION 

An RT-PCR ("reverse transcriptase-polymerase chain 
reaction") was carried out using total RNA extracted from 
5 virions, obtained from supernatants of B lymphocyte cells 
of patients suffering from multiple sclerosis, 
concentrated by ultracentrif ugations . The synthesis of 
cDNA was carried out with a specific primer SEQ No. XXX 
and the reverse transcriptase "Expand™ RT" from BOEHRINGER 
10 MANNHEIM according to the conditions recommended by the 
company . 

Primer used for the synthesis of the cDNA, identified by 
SEQ ID NO : 138 : 

5 1 CTT GGA GGG TGC ATA ACC AGG GAA T 3 ' 
15 A PCR amplification was carried out with Taq 

polymerase (Perkin Elmer, France) under the following 
conditions: 94°C 1 min, 55°C 1 min, 72°C 2 min over 35 
cycles at 72 °C for 7 min and with a final reaction volume 
of 100 pi. 

20 Primers used for the PCR amplification: 

5' primer, identified by SEQ ID NO: 139 
5 ' TGT CCG CTG TGC TCC TGA TC 3 1 

- 3' primer, identified by SEQ ID NO: 138 
5' CTT GGA GGG TGC ATA ACC AGG GAA T 3' 

25 A second so-called "seminested" PCR amplification 

was carried out with a 3 1 primer situated inside the 
region already amplified. This second amplification was 
carried out under the same experimental conditions as 
those used during the first amplification, using 10 \il of 

30 the amplification product derived from the first PCR. 
Primers used for the "seminested" PCR amplification: 

- 5 f primer, identified by SEQ ID NO: 139 
5 1 TGT CCG CTG TGC TCC TGA TC 3 ' 

- 3' primer, identified by SEQ ID NO: 140 
35 5 1 CTA TGT CCT TTT GGA CTG TTT GGG T 3 1 
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The primers SEQ ID NO: 138 and SEQ ID NO: 140 are 
specific for the gag region, clone 2 nucleotide position 
No. 373-397 and No. 433-456. The primers used in the 5 f 
region were defined on sequences of the clones obtained 
5 during preliminary tests. 

An amplification product of 764 bp was obtained 
and cloned in the following manner: 

The amplified DNA was inserted into a plasmid with 
the aid of the TA Cloning kit™. The 2 pi of DNA solution 

10 were mixed with 5 pi of sterile distilled water, 1 pi of a 
10 times concentrated ligation buffer "10X LIGATION 
BUFFER" , 2 pi of "pCR™ VECTOR" (25 ng/ml)and 1 pi of "T4 
DNA LIGASE". This mixture was incubated overnight at 14 °C. 
The following steps were carried out in accordance with 

15 the instructions of the TA Cloning kit® (Invitrogen) . The 
mixture was plated after transformation of the ligation 
into E. coli INVaF 1 bacteria. At the end of the procedure, 
the white colonies of recombinant bacteria were 
subcultured so as to be cultured and allow the extraction 

20 of the plasmids incorporated according to the so-called 
"DNA minipreparation" procedure (17) . The plasmid 
preparation of each recombinant colony was cut with the 
restriction enzyme EcoRI and analyzed on agarose gel. The 
plasmids possessing an insert detected under UV light 

25 after staining the gel with ethidium bromide were selected 
for the sequencing of the insert after hybridization with 
a primer complementary to the T7 promoter present on the 
cloning plasmid of the TA cloning kit®. The reaction prior 
to the sequencing was then carried out according to the 

30 method recommended for using the sequencing kit " PRISM™ 
Ready Reaction Amplitaq® FS, DyeDeoxy™ Terminator" 
(Applied Biosystems, ref. 402119) and the automated 
sequencing was carried out on the Applied Biosystems 373 A 
and 377 apparatus, according to the manufacturer's 

35 instructions. 
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The LB13 clone obtained contains an N-terminal 
region of MSRV-1 gag gene homologous to clone 2 and an LTR 
corresponding to a portion of the U5 region. Between the 
U5 region and gag, a binding site for the transfer RNAs, 
5 the PBS "primer binding site", was identified. 

The nucleotide sequence of the 764 bp fragment of 
the LB13 clone in the plasmid "pCR™ vector" is represented 
in the identifier SEQ ID NO: 141. 

The binding site for the transfer RNAs, having a 
10 sequence of PBS tryptophan type, was identified at 
nucleotide position No. 342-359 of the LB13 clone. 

As this same PBS was found in the endogenous 
copies homologous to MSRV1, the endogenous family thus 
defined is henceforth called HERV W, according to the 
15 nomenclature proposed for the endogenous retrovirus 
families (W^tryptophan) . 

A short ORF of about 65 amino acids was found in 
the U5 region of the 5 f LTR of the LB13 clone. 
Sequence of the ORF: 
20 PMASNRAITLTAWSKI PFLGIRETKNPRSENTRLATMLEAAHHHFGSSPPLSWEL 
WEQGPQVTIW. 

The corresponding nucleotide sequence starting at 
an ATG codon is capable of being expressed in a subgenomic 
DNA from a proviral LTR (U3RU5) . 

25 Another clone, called LA15, was obtained on the 

total RNA extracted from virions concentrated by 
ultracentrif ugation from a culture supernatant of 
synoviocytes obtained from a patient suffering from 
rheumatoid arthritis. The strategy for amplifying and 

30 cloning the LA15 clone is exactly the same which was used 
for the LB13 clone. 

The nucleotide sequence of the LA15 clone, which 
is represented in the identifier SEQ ID NO: 142, is very 
similar to the LD13 clone. This suggests that the MSVR-1 

35 retrovirus detected in multiple sclerosis has sequences 
which are similar to those found in rheumatoid arthritis. 
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CLAIMS 

1. Nucleic material, in isolated or purified state, 
comprising a nucleotide sequence chosen from the group 

5 which consists of (i) the sequences SEQ ID NO: 112, SEQ ID 
NO: 114, SEQ ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, 
SEQ ID NO: 130, SEQ ID NO: 141 and SEQ ID NO: 142; (ii) 
the sequences complementary to sequences (i); and (iii) 
the sequences equivalent to sequences (i) or (ii), in 
10 particular the sequences having, for every series of 100 
contiguous monomers, at least 50%, and preferentially at 
least 70% homology with sequences (i) or (ii) 
respectively. 

2. Nucleic material, in isolated or purified state, 
15 encoding a polypeptide having, for every contiguous series 

of at least 30 amino acids, at least 50%, and preferably 
at least 70% homology with a peptide sequence chosen from 
the group which consists of SEQ ID NO: 113, SEQ ID NO: 
115, SEQ ID NO: 118, SEQ ID NO: 121, SEQ ID NO: 135 and 
20 SEQ ID NO: 137. 

3. Retroviral nucleic material, whose pol gene comprises a 
nucleotide sequence identical or equivalent to a sequence 
chosen from the group which consists of SEQ ID NO: 112, 
SEQ ID NO: 124 and their complementary sequences. 

25 4. Retroviral nucleic material, in which the 5 1 end of the 
pol gene starts at nucleotide 1419 of SEQ ID NO: 130. 

5. Retroviral nucleic material, in which the pol gene 
encodes a polypeptide having, for every contiguous series 
of at least 30 amino acids, at least 50%, and preferably 

30 at least 70% homology with the peptide sequence SEQ ID NO: 
113. 

6. Retroviral nucleic material, in which the 3' end of the 
gag gene ends at nucleotide 1418 of SEQ ID NO: 130. 

7. Retroviral nucleic material, in which the env gene 
35 comprises a nucleotide sequence identical or equivalent to 
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a sequence chosen from the group which consists of SEQ ID 
NO: 117, and its complementary sequences. 

8. Retroviral nucleic material, in which the env gene 
comprises a nucleotide sequence which starts at nucleotide 

5 1 of SEQ ID NO: 117 and ends at nucleotide at nucleotide 
[sic] 233 of SEQ ID NO: 114. 

9. Retroviral nucleic material, in which the env gene 
encodes a polypeptide having, for every contiguous series 
of at least 30 amino acids, at least 50%, and preferably 

10 at least 70% homology with the sequence SEQ ID NO: 118. 

10. Retroviral nucleic material in which the U3R region of 
the 3 1 LTR comprises a nucleotide sequence which ends at 
nucleotide 617 of SEQ ID NO: 114. 

11. Retroviral nucleic material in which the RU5 region of 
15 the 5 f LTR comprises a nucleotide sequence which starts at 

nucleotide 755 of SEQ ID NO: 120 and ends at nucleotide 
337 of SEQ ID NO: 141 or SEQ ID NO: 142. 

12. Retroviral nucleic material comprising a sequence 
which starts at nucleotide 755 of SEQ ID NO: 120 and which 

20 ends at nucleotide 617 of SEQ ID NO: 114. 

13. Retroviral nucleic material according to any one of 
the preceding claims, characterized in that it is 
associated with at least one autoimmune disease such as 
multiple sclerosis or rheumatoid arthritis. 

25 14. Nucleotide fragment comprising a nucleotide sequence 
chosen from the group which consists of (i) the sequences 
SEQ ID NO: 112, SEQ ID NO: 114, SEQ ID NO: 117, SEQ ID NO:. 
120, SEQ ID NO: 124, SEQ ID NO: 130, SEQ ID NO: 141 and 
SEQ ID NO: 142; (ii) the sequences complementary to 

30 sequences (i); and (iii) the sequences equivalent to 
sequences (i) or (ii), in particular the sequences having, 
for every series of 100 contiguous monomers, at least 50%, 
and preferentially at least 70% homology with sequences 
(i) or (ii) respectively. 

35 15. Nucleotide fragment according to Claim 14, consisting 
of a nucleotide sequence chosen from the group which 
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consists of (i) the sequences SEQ ID NO: 112, SEQ ID NO: 
114, SEQ ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, SEQ 
ID NO: 130, SEQ ID NO: 141 and SEQ ID NO: 142; (ii) the 
sequences complementary to sequences (i); and (iii) the 
5 sequences equivalent to sequences (i) or (ii) , in 
particular the sequences having, for every series of 100 
contiguous monomers, at least 50%, and preferentially at 
least 70% homology with sequences (i) or (ii) 
respectively . 

10 16. Nucleotide fragment comprising a nucleotide sequence 
encoding a polypeptide having, for every contiguous series 
of at least 30 amino acids, at least 50%, and preferably 
at least 70% homology with a peptide sequence chosen from 
the group which consists 

15 of SEQ ID NO: 113, SEQ ID NO: 115, SEQ ID NO: 118, SEQ ID 
NO: 121, SEQ ID NO: 135 and SEQ ID NO: 137. 

17. Nucleotide fragment according to claim 16, consisting 
of a nucleotidesequence encoding a polypeptide having, for 
every contiguous series of at least 30 amino acids, at 

20 least 50%, and preferably at least 70% homology with a 
peptide sequence chosen from the group which consists of 
SEQ ID NO: 113, SEQ ID NO: 115, SEQ ID NO: 118, SEQ ID NO: 
121, SEQ ID NO: 135 and SEQ ID NO: 137. 

18. Nucleic probe for the detection of a retrovirus 
25 associated with multiple sclerosis and/or rheumatoid 

arthritis, characterized in that it is capable of 
hybridizing specifically with any fragment according to 
any one of claims 14 to 17, belonging to the genome of 
said retrovirus. 

30 19. Probe according to claim 18, characterized in that it 
possesses from 10 to 100 nucleotides, preferably from 10 
to 30 nucleotides. 

20. Primer for the amplification, by polymerization, of an 
RNA or of a DNA of a retrovirus associated with multiple 
35 sclerosis and/or rheumatoid arthritis, characterized in 
that it comprises a nucleotide sequence identical or 
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equivalent to at least a portion of the nucleotide 
sequence of a fragment according to any one of claims 8 to 
11, in particular a nucleotide sequence having, for every 
series of 10 contiguous monomers, at least 50%, preferably 
5 at least 70% homology with at least said portion of said 
fragment . 

21. Primer according to claim 20, characterized in 
that its nucleotide sequence is chosen from SEQ ID NO: 
116, SEQ ID NO: 119, SEQ ID NO: 122, SEQ ID NO: 123, SEQ 

10 ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 
129, SEQ ID NO: 132, and SEQ ID NO: 133. 

22. RNA or DNA, and in particular replication and/or 
expression vector, comprising a genomic fragment of the 
nucleic material according to any one of claims 1 to 7 or 

15 a fragment according to any one of claims 14 to 17. 

23. Peptide encoded by any open reading frame belonging to 
a nucleotide fragment according to any one of claims 14 to 
17, in particular a polypeptide, for example oligopeptide 
forming or comprising an antigenic determinant recognized 

20 by sera of patients infected with the MSRV-1 virus, and/or 
in whom the MSRV-1 virus has been reactivated. 

24. Peptide according to claim 23 comprising a sequence 
identical, partially or completely, or equivalent to a 
sequence chosen from SEQ ID NO: 113, SEQ ID NO: 115, SEQ 

25 ID NO: 118, SEQ ID NO: 121, SEQ ID NO: 135 and. SEQ ID NO: 
137. 

25. Diagnostic, prophylactic or therapeutic composition, 
in particular for inhibiting the expression of at least 
one retrovirus associated with multiple sclerosis and/or 

30 rheumatoid arthritis, comprising a nucleotide fragment 
according to any one of claims 14 to 17. 

26. Method for detecting a retrovirus associated with 
multiple sclerosis and/or rheumatoid arthritis, in a 
biological sample, characterized in that an RNA and/or a 

35 DNA assumed to belong to or obtained from said retrovirus, 
or their complementary RNA and/or DNA, is brought into 
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contact with a composition comprising a nucleotide 
fragment according to any one of claims 14 to 17. 
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SEQUENCE LISTING 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68 
GACTCGCTGC AGATCGATTT TTTTTTTTTT TTTT 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69 
GCCATCAAGC CACCCAAGAA CTCTTAACTT 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
CCAATAGCCA GACCATTATA TACACTAATT 

(2) INFORMATION FOR SEQ ID NO : 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

GCTTATAGAA GGACCCCTAG TATGGGGTAA TCCCCTCTGG GAAACCAAGC CCCAGTACTC 60 

AGCAGGAAAA ATAGAATAGG AAACCTCACA AGGACATACT TTCCTCCCCT CCAGATGGCT 120 

AGCCACTGAG GAAGGAAAAA TACTTTCACC TGCAGCTAAC CAACAGAAAT TACTTAAAAC 18 0 

CCTTCACCAA ACCTTCCACT TAGGCATTGA TAGCACCCAT CAGATGGCCA AATTATTATT 24 0 

TACTGGACCA GGCCTTTTCA AAACTATCAA GAAGATAGTC AGGGGCTGTG AAGTGTGCCA 300 

AAGAAATAAT 310 

(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

i 

L I EGPLVWGNPLWETKPQY S AGKI EXETSQGHTFLPSRWLATEEGKI LS p AANQQKLLKTLHQTFHLG I D 
STHQMAKLLFTGPGLFKTIKKIVRGCEVCQRNN i 



(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 635 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
CCCTGTATCT TTAACCTCCT TGTTAAGTTT GTCTCTTCCA GAATCAAAAC TGTAAAACTA 60 
CAAATTGTTC TTCAAATGGA GCACCAGATG GAGTCCATGA CTAAGATCCA CCGTGGACCC 120 
CTGGACCGGC CTGCTAGCCC ATGCTCCGAT GTTAATGACA TTGAAGGCAC CCCTCCCGAG 180 
GAAATCTCAA CTGCACAACC CCTACTATGC CCCAATTCAG CGGGAAGCAG TTAGAGCGGT 24 0 
CATCAGCCAA CCTCCCCAAC AGCACTTGGG TTTTCCTGTT GAGAGGGGGG AC T GAG AG AC 300 
AGGACTAGCT GGATTTCCTA GGCCAACGAA GAATCCCTAA GCCTAGCTGG GAAGGTGACT 360 
GCATCCACCT CTAAACATGG GGCTTGCAAC TTAGCTCACA CCCGACCAAT CAGAGAGCTC 4 20 
ACTAAAATGC TAATTAGGCA AAAATAGGAG GTAAAGAAAT AGCCAATCAT CTATTGCCTG 4 80 
AGAGCACAGC GGGAGGGACA AGGATCGGGA TATAAACCCA GGCATTCGAG CCGGCAACGG 54 0 
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CAACCCCCTT TGGGTCCCCT CCCTTTGTAT GGGCGCTCTG TTTTCACTCT ATTTCACTCT 600 
ATTAAATCTT GCAACTGAAA AAAAAAAAAA AAAAA 635 

(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 

PCIFNLLVKFVSSRIKTVKLQIVLQMEHQMESMTKIHRGPLDRPASPCSDVNDIEGTPPEEISTAQPLLC 1 
PUShGSS \ 



(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 
• (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
TGGGGTTCCA TTTGTAAGAC CATCTGTAGC TT 32 

(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1481 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) . TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
ATGGCCCTCC CTTATCATAC TTTTCTCTTT ACTGTTCTCT TACCCCCTTT CGCTCTCACT 60 
GCACCCCCTC CATGCTGCTG TACAACCAGT AGCTCCCCTT ACCAAGAGTT TCTATGAAGA 12 0 
ACGCGGCTTC CTGGAAATAT TGATGCCCCA TCATATAGGA GTTTATCTAA GGGAAACTCC 18 0 
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780 


ATAGTCTGCC 


TACCCTCAGG 


AATATTTTTT 


GTCTGTGGTA 


CCTCAGCCTA 


TCATTGTTTG 


840 


AATGGCTCTT 


CAGAATCTAT 


GTGCTTCCTC 


TCATTCTTAG 


TGCCCCCTAT 


GACCATCTAC 


900 


ACTGAACAAG 


ATTTATACAA 


TCATGTCGTA 


CCTAAGCCCC 


ACAACAAAAG 


AGTACCCATT 


960 


CTTCCTTTTG 


TTATCAGAGC 


AGGAGTGCTA 


GGCAGACTAG 


GTACTGGCAT 


TGGCAGTATC 


1020 


ACAACCTCTA 


CTCAGTTCTA 


C T AC AAAC T A 


T C T C AAG AAA 


TAAATGGTGA 


' CATGGAACAG 


1080 


GTCACTGACT 


CCCTGGTCAC 


CTTGCAAGAT 


CAACTTAACT 


CCCTAGCAGC 


AGTAGTCCTT 


114 0 


C AAAAT C G AA 


GAGCTTTAGA 


CTTGCTAACC 


GCCAAAAGAG 


GGGGAACCTG 


TTTATTTTTA 


1200 


GGAGAAGAAC 


GCTGTTATTA 


TGTTAATCAA 


TCCAGAATTG 


TCACTGAGAA 


AGTTAAAGAA 


1260 


ATTCGAGATC 


GAATACAATG 


TAGAGCAGAG 


GAGCTTCAAA 


ACACCGAACG 


CTGGGGCCTC 


1320 


CTCAGCCAAT 


GGATGCCCTG 


GGTTCTCCCC 


TTCTTAGGAC 


CTCTAGCAGC 


TCTAATATTG 


1380 


TTACTCCTCT 


TTGGACCCTG 


TATCTTTAAC 


CTCCTTGTTA 


AGTTTGTCTC 


TTCCAGAATT 


1440 


GAAGCTGTAA 


AGCTACAGAT 


GGTCTTACAA 


ATGGAACCCC 


A 




1481 



(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 93 amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
MALPYHTFLFTVLLPPFALTAPPPCCCTTSSSPYQEFLXRTRLPGNIDAPSYRSLSKGNSTFTAHTHMPR 
NCYNSATLCMHANTHYWTGKMINPSCPGG LGATVCWTYFTHTSMSDGGGIQGQAREKQVKEAISQLTRGH 
STPSPYKGLVLSKLHETLRTHTRLVSLFNTTLTRLHEVSAQNPTNCWMCLPLHFRPYISIPVPEQWNNFS 
TEI NTTSVLVGPLVSNLE ITHTSNLTCVKFSNTIDTTSSQCIRWVTPPTRI VCLPSGI FFVCGTSAYHCL 
NGSSESMCFLSFLVPPMTIYTEQDLYNHWPKPHNKRVP I LPFVI RAG VLGRLGTG I GS I TTSTQFYYKL 
SQEINGDMEQVTDSLVTLQDQLNSLAAWLQNRRALDLLTAKRGGTCLFLGEERCYYVNQSRIVTEKVKE 
IRDRIQCRAEELQNTERWGLLSQWMPWVLPFLGPLAALILLLLFGPCIFNLLVKFVSSRIEAVKLQMVLQ 
MEP 
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(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
TCAAAATCGA AGAGCTTTAG ACTTGCTAAC CG 32 



(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1329 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 






rp TV TV 7\ "A rn 7\ 

I CAAAA 1 CGA 


AG AG C T T TAG 


ACTTGC1 AAC 


C G C C AAAAG A 


GGGGGAACCT 


v"-* rp rp rp 71 rp rp rp rp rp 


bU 


AGGGGAAGAA 


TGCTGTTAGT 


ATGTTAATCA 


ATCTGGAATC 


ATTACTGAGA 


AAGTTAAAGA 


120 


AATTTGAGAT 


CGAATATAAT 


GTAGAGCAGA 


GGACCTTCAA 


AACACTGCAC 


CCTGGGGCCT 


180 


CCTCAGCCAA 


TGGATGCCCT 


GGACTCTCCC 


CTTCTTAGGA 


CCTCTAGCAG 


CTATAATATT 


240 


TTTACTCCTC 


TTTGGACCCT 


GTATCTTCAA 


CTTCCTTGTT 


AAGTTTGTCT 


CTTCCAGAAT 


300 


TGAAGCTGTA 


AAG C T AC AAA 


TAGTTCTTCA 


AATGGAACCC 


CAGATGCAGT 


C CAT G AC T AA 


360 


AATCTACCGT 


GGACCCCTGG 


ACCGGCCTGC 


TAG AC TAT G C 


TCTGATGTTA 


ATGACATTGA 


420 


AGTCACCCCT 


CCCGAGGAAA 


TCTCAACTGC 


ACAACCCCTA 


CTACACTCCA 


ATTCAGTAGG 


480 


AAGCAGTTAG 


AGCAGTTGTC 


AGCCAACCTC 


CCCAACAGTA 


CTTGGGTTTT 


CCTGTTGAGA 


540 


GGGTGGACTG 


AGAGACAGGA 


CTAGCTGGAT 


TTCCTAGGCT 


GACTAAGAAT 


CCCNAAGCCT 


600 


ANCTGGGAAG 


GTGACCGCAT 


CCATCTTTAA 


ACATGGGGCT 


TGCAACTTAG 


CTCACACCCG 


660 


ACCAATCAGA 


GAGCTCACTA 


AAATGCTAAT 


CAGGCAAAAA 


CAGGAGGTAA 


AGCAATAGCC 


720 


AATCATCTAT 


TGCCTGAGAG 


CACAGCGGGA 


AGGACAAGGA 


TTGGGATATA 


AAC T C AG G C A 


780 


TTCAAGCCAG 


CAACAGCAAC 


CCCCTTTGGG 


TCCCCTCCCA 


TTGTATGGGA 


GCTCTGTTTT 


840 


CACTCTATTT 


CACTCTATTA 


AATCATGCAA 


CTGCACTCTT 


CTGGTCCGTG 


TTTTTTATGG 


900 


CTCAAGCTGA 


GCTTTTGTTC 


GCCATCCACC 


ACTGCTGTTT 


GCCACCGTCA 


CAGACCCGCT 


960 


GCTGACTTCC 


ATCCCTTTGG 


AT C C AG C AG A 


GTGTCCACTG 


TGCTCCTGAT 


CCAGCGAGGT 


1020 


ACCCATTGCC 


ACTCCCGATC 


AGGCTAAAGG 


CTTGCCATTG 


TTCCTGCATG 


GCTAAGTGCC 


1080 


TGGGTTTGTC 


CTAATAGAAC 


TGAACACTGG 


TCACTGGGTT 


CCATGGTTCT 


CTTCCATGAC 


1140 


CCACGGCTTC 


TAATAGAGCT 


ATAACACTCA 


CCGCATGGCC 


CAAGATTCCA 


TTCCTTGGTA 


1200 


TCTGTGAGGC 


CAAGAACCCC 


AGGTCAGAGA 


ANGTGAGGCT 


TGCCACCATT 


TGGGAAGTGG 


1260 
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CCCACTGCCA TTTTGGTAGC GGCCCACCAC CATCTTGGGA GCTGTGGGAG CAAGGATCCC 1320 



(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

QNRRALDLLTAKRGGTCLFI/GEECCXYVNQSGI ITEKVKEIXDRIXCRAEDLQNTAPWGLLSQWMPWTLP 
FLGPLAAI IFLLLFGPCIFNFLVXFVSSRIEAVKLQIVLQMEPQMQSMTKI YRGPLDRPARLCSDVNDIE 
VTPPEEISTAQPLLHSNSVGSS 



(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single' 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 
GGCATTGATA GCACCCATCA G 21 

(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleotide 



CCAGTAACA 



1329 



(C) STRANDEDNESS: single 



(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 



CATGTCACCA GGGTGGAATA G 



21 



(2) INFORMATION FOR SEQ ID NO: 124: 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: base pairs 
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(B) TYPE: nucleotide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 



(2) INFORMATION FOR SEQ ID NO : 12 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 
CATGTCACCA GGGTGGAATA G 25 



(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 
GGACAGGAAA GTAAGACTGA GAAGGC 2 6 

(2) INFORMATION FOR SEQ ID NO : 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) .TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 128: 
See Example 5 
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(2) INFORMATION FOR SEQ ID NO : 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 
See Example 5 

(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1511 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 



CCTAGAACGT 


ATTCTGGAGA 


ATTGGGACCA 


AT G T G AC AC T 


CAGACGCTAA 


GAAAGAAACG 


60 


ATTTATATTC 


TTCTGCAGTA 


CCGCCTGGCC 


ACAATATCCT 


CTTCAAGGGA 


GAGAAACCTG 


120 


GCTTCCTGAG 


GGAAGTATAA 


ATTATAACAT 


CAT C T T AC AG 


CTAGACCTCT 


TCTGTAGAAA 


180 


GGAGGGCAAA 


TGGAGTGAAG 


TGCCATATGT 


GCAAACTTTC 


TTTTCATTAA 


G AG AC AAC T C 


240 


ACAATTATGT 


AAAAAGTGTG 


GTTTATGCCC 


TACAGGAAGC 


CCTCAGAGTC 


CACCTCCCTA 


300 


CCCCAGCGTC 


CCCTCCCCGA 


CTCCTTCCTC 


AACTAATAAG 


GACCCCCCTT 


TAACCCAAAC 


360 


GGTCCAAAAG 


GAGATAGACA 


AAGGGGTAAA 


CAATGAACCA 


AAGAGTGCCA 


ATATTCCCCG 


420 


ATTATGCCCC 


CTCCAAGCAG 


TGAGAGGAGG 


AGAATTCGGC 


C C AG C C AG AG 


TGCCTGTACC 


480 


TTTTTCTCTC 


TCAGACTTAA 


AGCAAATTAA 


AATAGACCTA 


GGTAAATTCT 


CAGATAACCC 


540 


TGACGGCTAT 


ATTGATGTTT 


TACAAGGGTT 


AGGACAATCC 


TTTGATCTGA 


CAT G GAG AG A 


600 


TATAATGTTA 


CTACTAAATC 


AGACACTAAC 


CCCAAATGAG 


AGAAGTGCCG 


CTGTAACTGC 


660 


AGCCCGAGAG 


TTTGGCGATC 


TTTGGTATCT 


CAGTCAGGCC 


AACAATAGGA 


T G AC AAC AG A 


720 


GGAAAGAACA 


ACTCCCACAG 


GCCAGCAGGC 


AGTTCCCAGT 


GTAGACCCTC 


ATTGGGACAC 


780 


AGAATCAGAA 


CAT GG AG ATT 


GGTGCCACAA 


ACATTTGCTA 


ACTTGCGTGC 


TAGAAGGACT 


840 


GAGGAAAACT 


AGGAAGAAGC 


CTATGAATTA 


CTCAATGATG 


TCCACTATAA 


CACAGGGAAA 


900 


GGAAGAAAAT 


CTTACTGCTT 


TTCTGGACAG 


ACTAAGGGAG 


G CAT T GAG G A 


AGCATACCTC 


960 


CCTGTCACCT 


GACTCTATTG 


AAGGCCAACT 


AATCTTAAAG 


GATAAGTTTA 


TCACTCAGTC 


1020 


AGCTGCAGAC 


AT T AGAAAAA 


ACTTCAAAAG 


TCTGCCTTAG 


GCCCGGAGCA 


GAACTTAGAA 


1080 


ACCCTATTTA 


ACTTGGCATC 


CTCAGTTTTT 


TATAATAGAG 


AT C AG GAG G A 


GCAGGCGAAA 


1140 


CGGGACAAAC 


GGGATAAAAA 


AAAAAGGGGG 


GGTCCACTAC 


TTTAGTCATG 


GCCCTCAGGC 


1200 


AAGCAGACTT 


TGGAGGCTCT 


GCAAAAGGGA 


AAAGCTGGGC 


AAATCAAATG 


CCTAATAGGG 


1260 
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CTGGCTTCCA GTGCGGTCTA CAAGGACACT 

CGCCCCCTTG TCCATGCCCC TTACGTCAAG 

GAT G AAG AT A CTCTGAGTCA G AAG C CAT T A 

GCCCGGGGCG AGCGCCAGCC CATGCCATCA 
TTGAGAGCCA A 



TTAAAAAAGA TTATCCAAGT AGAAATAAGC 1320 
GGAATCACTG GAAGGCCCAC TGCCCCAGGG 1380 
ACCAGATGAT CCAGCAGCAG GACTGAGGGT 14 4 0 
CCCTCACAGA GCCCCGGGTA TGTTTGACCA 1500 

1511 



(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 347 amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

LERILEmiDQCDTQTLRKKilFIFFCSTAWPQYPLQGRETWLPEGSINYNIILQLDLFCRKJEGKWSEVPYV 
QTFFS LRDNSQLCKKCGLCPTGS PQS PP P Y PS VPS PTPS STNKDPPLTQTVQKE I DKGVNNEPKS AN I PR 
LCPI^AVRGGEFGPARVPVPFSLSDLKQIKIDI^KFSDNPDGYIDVT^I^QSFDLTWRDIMLLLNQTLT 
PNERSAAVTAAREFGDLVnfLSQANNRMTTEERTTPTGQQAVPSVDPHWDTESEHGDWCHKHLLTCVLEGL 
RKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRKHTSLSPDSIEGQLILKDKFITQSAADIRKNFKS 
LP 



(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
TGCTGGAATT CGGGATCCTA GAACGTATTC 30 



(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 
AGTTCTGCTC CGAAGCTTAG GCAGACTTTT 

(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 

MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRILERILENWDQCDTQTLRKKRFIFFCSTAWPQYPLQGR 
ETWLPEGSINYNIILQLDLFCRKEGKWSEVPYVQTFFSLRDNSQLCKKCGLCPTGSPQSPPPYPSVPSPT 
PSSTNKDPPLTQTVQKEIDKGVNNEPKSANIPRLCPLQAVRGGEFGPARVPVPFSLSDLKQIKIDLGKFS 
DNPDGYIDVI^LGQSFDLTVmDIMLLLNQTLTPNERSAAVTAAREFGDLWLSQANNRMTTEERTTPTG 
QQAVPSVDPHVTOTESEHGDWCHKHLLTCVLEGLRKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRK 
HTSLSPDSIEGQLILKDKFITQSAADIRKNFKSLPKX.AAALEHHHHHH 



(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: amino acids 

(B) TYPE: peptide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 

MASMTGGQQMGRILERILENWDQCDTQTLRKKRFIFFCSTAWPQYPLQGRETWLPEGSINYNIILQLDLF 
CRKEGKWSEVPYVQTFFSLRDNSQLCKKCGLCPTGSPQSPPPYPSVPSPTPSSTNKDPPLTQTVQKEIDK 
GVNNEPKSANIPRLCPLQAVRGGEFGPARVPVPFSLSDLKQIKIDLGKFSDNPDGYIDVLQGLGQSFDLT 
WRDIMLLLNQTLTPNERSAAVTAAREFGDLWYLSQANNRMTTEERTTPTGQQAVPSVDPHWDTESEHGDW 
CHKHLLTCVXEGLRKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRKHTSLSPDSIEGQLILKDKFI 
TQSAADIRKNFKSLPKLAAALEHHHHHH 
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(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 
CTTGGAGGGT GCATAACCAG GGAAT 2 5 

(2) INFORMATION FOR SEQ ID NO : 139: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 
TGTCCGCTGT GCTCCTGATC 20 

(2) INFORMATION FOR SEQ ID NO : 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 
CTATGTCCTT TTGGACTGTT TGGGT 2 5 

(2) INFORMATION FOR SEQ ID NO : 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 764 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 
.(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
TGTCCGCTGT GCTCCTGATC CAGCACAGGC GCCCATTGCC TCTCCCAATT GGGCTAAAGG 60 
CTTGCCATTG TTCCTGCACA GCTAAGTGCC TGGGTTCATC CTAATCGAGC TGAACACTAG 120 
TCACTGGGTT CCACGGTTCT CTTCCATGAC CCATGGCTTC TAATAGAGCT ATAACACTCA 180 
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CTGCATGGTC 


C A AG AT TCP A 


TTCCTTC^AA 


tcp^tgagap 


PAAHAAPPPP 


APPTPAPapn 

nuVJ A ^.rA.'o.rAvj/A 


O AC\ 

Z.H\J 


ACACAAGGCT 


tgccaccatg 


TT^nAAncAf; 


CCCACCACCA 


TTTTHHAAPP 


Anppppppar 


JUU 


TATCTTGGC^A 




CAAnnArrrr 


A Pi fiT A AC A AT 


TTHPTHAPPA 






TGAATCCflCA 


^Av-* V .TA A VjrtJ^OvJ 




PPAATTP^AA 


ATPTTPPTPP 
r\ A o A A k_*V_* A V_,V_ 




H z. u 


ATGCCCC'T A A 


f^AT^TATTPT 

A O A 11^/1 


uunonn A loo 






M.o 1 MTALj/a/A-HJa. 


4 o u 


AAATGACTTA 


TATTCTTCTG 


CAGTACCGCC 


CTGGCCACGA 


TATCCTCTTC 


AAGGGGGAGA 


540 


AACCTGGCCT 


CCTGAGGGAA 


GTATAAATTA 


TAACACCATC 


TTACAGCTAG 


ACCTGTTTTG 


600 


TAGAAAAGGA 


GGCAAATGGA 


GTGAAGTGCC 


ATATTTACAA 


ACTTTCTTTT 


CAT T AAAAGA 


660 


CAACTCGCAA 


TTATGTTAAC 


AGTGTGATTT 


GTGTTCCTAC 


ACGGAAGCCC 


TCAGATTCTA 


720 


CTCCCCACCC 


CCGGCATCTC 


CCCTGAATCC 


CTCCCCAACT 


TATT 
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(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 800 base pairs 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 



TGTCCGCTGT 


GCTCCTGATC 


CAGCACAGGC 


GCCCATTGCC 


TCTCCCAATT 


GGGCTAAAGG 


60 


CTTGCCATTG 


TTCCTGCACA 


GCTAAGTGCC 


TGGGTTCATC 


CTAATCGAGC 


T GAACACT AG 


120 


TCACTGGGTT 


CCACGGTTCT 


CTTCCATGAC 


CCATGGCTTC 


TAATAGAGCT 


ATAACACTCA 


180 


CTGCATGGTC 


CAAGATTCCA 


TTCCTTGGAA 


TCCGTGAGAC 


CAAGAACCCC 


AGGTCAGAGA 


240 


ACACAAGGCT 


TGCCACCATG 


TTGGAAGCAG 


CCCACCACCA 


TTTTGGAAGC 


GGCCCGCCAC 


300 


TATCTTGGGA 


GCTCTGGGAG 


CAAGGACCCC 


CAGGTAACAA 


TTTGGTGACC 


ACGAAGGGAC 


360 


CTGAATCCGC 


AACCATGAAG 


GGATCTCCAA 


AGCAATTGGA 


AATGTTCCTC 


CCAAGGCAAA 


420 


AATGCCCCTA 


AGATGTATTC 


TGGAGAATTG 


GGACCAATCT 


GACCCTCAGA 


CAGTAAGAAA 


480 


AAAAAT G AC T 


TATATTCTTC 


TGCAGTACCG 


CCTGGCCACG 


GATATCCTCT 


TCAAGGGGGA 


540 


GAAACCTGGC 


CTCCTGAGGG 


AAGTATAAAT 


TATAACACCA 


TCTTACAGCT 


AGACCTGTTT 


600 


TGTAGAAAAG 


GAGGCAAATG 


GAGTGAAGTG 


CCATATTTAC 


AAACTTTCTT 


TTCATTAAAA 


660 


GACAACTCGC 


AATTATGTAA 


ACAGTGTGAT 


TTGTGTCCTA 


CAGGAAGCCC 


TCAGATCTAC 


720 


CTCCCTACCC 


CGGCATCTCC 


CTGACTCCTT 


CCCCAACTAA 


TAAGGACCCA 


CTTCAGCCCA 


780 


AACAGTCCAA 


AAGGACATAG 
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ABSTRACT 

RETROVIRAL NUCLEIC MATERIAL AND NUCLEOTIDE FRAGMENTS , IN PARTICULAR 
ASSOCIATED WITH MULTIPLE SCLEROSIS AND/OR RHEUMATOID ARTHRITIS, FOR 
DIAGNOSTIC, PROPHYLACTIC AND THERAPEUTIC USES 



Nucleic material, in isolated or purified state, comprising a 
nucleotide sequence chosen from the group which consists of (i) 
the sequences SEQ ID NO: 112, SEQ ID NO: 114, SEQ ID NO: 117, 
SEQ ID NO: 120, SEQ ID NO: 124, SEQ ID NO: 130, SEQ ID NO: 141 
and SEQ ID NO: 142; (ii) the sequences complementary to 
sequences (i); and (iii) the sequences equivalent to sequences 
(i) or (ii), in particular the sequences having, for every 
series of 100 contiguous monomers, at least 50%, and 
preferentially at least 70% homology with sequences (i) or (ii) 
respectively, and uses for detecting a retrovirus associated 
with multiple sclerosis and/or rheumatoid arthritis. 
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FIG 2 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

GCTTATAGAA GGAOOOCTAG TATGGGGTAA TDQOG7ICIQG GAAACCAAGC 50 
AYRR TPS M G SPLG NQA 

LIE G.PLV WGN PLW ETKP 
L.K DP. YGVI PSG KPS 

CCCAGTACTC AGCAQGAAAA ATAGAATAQG AAADCTCACA AGGACATACT 100 
PVL SRKN RIG NLT RTYF 
QYS AGK IE.E TSQ GHT 
PSTQ QEK .NR KPHK DIL 

TTQCTTCCCCT GCAGATGGCT AQQCACTGAG GAAGGAAAAA TACTTICAOC 150 

PPL QMA SH.G RKN T F T 
FLPS RWL ATE EGKI LSP 
SSP PDG. PLR KEK YFHL 

TGCAGCTAAC CAACAGAAAT TACTTAAAAC CCTTCAGCAA AQCTTOCACT 200 
CS.P TEI T.N PSPN LPL 
A A N QQKL LKT LHQ TFHL 
QLT NRN YLKP FTK PST 

TAGGCATTGA TAGCACGCAT CAGATGQQCA AATTATTATT TACTGGACCA 250 
RH. HPS DGQ III YWTR 

GID STH QMAK LLF TGP 
. A L I API RWP NYYL LDQ 

GGCCTTTTCA AAACTATCAA GAAGATAGTC AGGGQCTGTG AAGTGTGQCA 300 

PFQ NYQ EDSQ GL. SVP 
GLFK TIK KIV RGCE VCQ 
AFS KLSR R.S GAV KCAK 

AAGAAATAAT 310 
K K . 
R N N 
E I 
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FIG 2 (continued) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

CCCIXJCA.TCT TTAAOCIOCT TGTTAAGTTT GTCTCTTCCA GAATCAAAAC 50 
PCIF NLL VKF VSSR IKT 
PVS LTSL LSL S L P ESKL 
LYL . P P C.VC LFQ NQN 

TGTAAAACTA CAAATTGTTC TTCAAATGGA GCACCAGATG GAGTCCATGA 100 
V K L QIVL QME HQM ESMT 
. NY KLF FKWS TRW SP. 
CKTT NCS S N G APDG VHD 

CTAAGATCCA CCGTGGAGCC CIGGACQGQC CTGCTAGCCC ATGCTCCGAT 150 

KIH RGP LDRP ASP CSD 
LRST VDP WTG LLAH APM 
. DP PWTP GPA C.P MLRC 

GTTAATGACA TTGAAGOCAC COCKXXDGAG GAAATCTCAA CrQCACAAGC 200 
VND I EGT PPE EIST AQP 
LMT LKAP LPR KSQ LHNP 
. .H . R H PSRG NLN CTT 

(XTACEATGC OOCAATTCAG CGGGAAGCAG TTAGAGOGGT CATCAGOCAA 250 
LLC PNSA GSS .SG HQPT 
YYA PIQ RE AV RAV ISQ 
PTMP QFS GKQ LERS SAN 

CCTOCOCAAC AGCACTTGGG TTTTCCTGTT GAGAGGQQGG ACTGAGAGAC 300 

SPT ALG FSC. EGG LRD 
PPQQ HLG FPV ERGD . E T . 
LPN STWV FLL RGG TERQ 

AGGACTAGCT GGATTTCCTA GQCCAACGAA GAATCCCTAA GQCTAGCTGG 350 
RTSW IS. a'n E E SL S LAG 
G L A G F -P^JR PTKNP - A.LG 
D.L DFL GQRR IPK PSW 
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FIG 3 
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1234567890 123456789 0 123456789Q 1234567890 12345678QD 

GAAQGTGACT GCATOCAQCT CTAAACA.TOG QQCTTGCAAC TTAGCICACA 400 
KVT ASTS KHG ACN LAHT 
R . L HPP LNMG LAT . L T 
EGDC I H L . T W GLQL SSH 

OOOGACCAAT CAGAGAGCIC ACTAAAATGC TAATEAQQCA AAAATAGGAG 450 

RPIRELTKMLIRQK E 
PDQS ESS LKC L G K NRR 

PTN QRAH N A N.A KIGG 

GTAAAGAAAT AQOCAATCAT CIATTGOTG AGAGCACAQC GQGAGGGACA 500 
VKK. PII Y C L R AQR EGQ 

• R N S Q S S IA. EHS GRDK 
K E I ANH L L P E STA GGT 

AGGATOGQGA TATAAAOQCA G0CATIO3AG GOQGGAAGGG CAAQCQCCIT 550 
GSG YKPR HSS RQR QPPL 
DRD INP GIRA GNG NPL 
R I G I T Q AFE PATA TPF 

tgggtooqct cccrrvsmr gqgogctcig titicactct atticactct 600 

GPL PLY GRSV FTL FHS 
WVPS LCM GAL FSLY FTL 
GSP ppvw ALC FHS ISLY 

ATEAAATCTT GCAACIGAAA AAAAAAAAAA AAAAA 635 
IKSC N.K KKK K 
LNL ATEK KKK K 

• I L QLK KKKK K 
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FIG 4 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

ATOGCXXTCC CTmTCATAC T1T1CTCTTT ACTGTTCTCT TACCGCCTTT 50 
MALP YHT FLF TVLL PPF 
WPS L I I L FSL LFS YPLS 
GPP LSY FSLY CSL TPF 

CGCTCTCACT GCACCCCCTC CATGCTGCTG TACAACCAGT AGCTCCCCTT 100 
ALT APPP CCC TTS SSPY 
L S L HPL H A A V QPV APL 
RSHC TPS MLL YNQ. LPL 

ACCAAGAGTT TCTATGAAGA ACGCGGCTTC CT3GAAATAT TGATGQOOCA 150 

QEF L.R TRLP GNI DAP 
TKSF YEE RGF LEIL MPH 
PRV SMKN AAS WKY .CPI 

TCATATAQGA GITTATCTAA GGGAAACICC ACCTTCACTG GQCACAGGCA 200 
SYRS LSK GNS TFTA HTH 
HIG VYLR ETP PSL PTPI 
I . E F I . GKLH LHC PHP 

TATGCCOGQC AACTGCTATA ACTCTGCCAC TCTTTGCATG CATGGAAATA 250 
MPR NCYN SAT LCM HANT 
CPA T A I TLPL FAC MQI 
YAPQ LL. LCH SLHA CKY 

CTCATTATTG GACAGQGAAA ATGATTAATC CTAGTTGTCC TGGAGGACTT 300 

HYW TGK MINP SCP GGL 
LIIG QGK .LI LVVL EDL 
SLL DREN D.S . L S WRTW 

GGAQCCACTG TCTGTTGGAC TTACTICACC CATACCAGTA TGTCTGATGG 350 
GATV CWT YFT HTSM SDG 
EPL SVGL TSP IPV CLMG 
SHC LLD LLHP YQY V.W 
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FIG 4 (continued) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

ACCICACCTG TGTAAAATTT AGCAATACTA TAGACACAAC CAQCTCCCAA 750 

LTC VKF SNTI DTT SSQ 
TSPV N L AIL . T Q P APN 

PHL CKI. Q YY RHN QLPM 

TGCATCAGGT GGGTAACACC TCOCACACGA ATAGTGIGCC TACQCTCAQG 800 
CIRW VTP PTR I V C L PSG 
ASG G.HL PHE .SA YPQE 
HQV GNT SHTN SLP TLR 

AATATTTTTT GTCTGTOGTA CCTCAGCCTA TCAl'lUl'l'lG AATOGCTCTT 850 
IFF VCGT SAY HCL NGSS 
Y F L SVV PQPI IV. MAL 
NIFC LWY LSL S LFE WLF 

CAGAATCEAT GIGCTTOCIC TCATICTEAG TGQQOOCTAT GAQCATCEAC 900 

ESM CFL SFLV PPM TIY 
QNLC ASS HS. CPL. PST 
RIY VLPL ILS APY DHLH 

ACIGAACAAG ATTLATACAA TCATCIOGTA OCLAAGGCXX: ACAACAAAAG 950 
TE QD LYN HVV PKPH NKR 
LNK IYTI MSY LSP TTKE 
. TR FI Q SCRT .AP Q QK 

AGTAOQCATT CTTOLTl'l'lG TTATCAGAQC AGGAGIGCEA QGCAGACTAG 1000 
VPI LPFV IRA GVL GRLG 
YPF FLL LSEQ EC. AD. 
STHS SFC YQS RSAR QTR 

GTACIGGCAT TGGCAGTATC ACAACCICTA CICAGTICTA CTACAAACTA 1050 

TGI GSI TTST QFY YKL 
VLA L AVS QPL LSST TNY 
YWH WQYH NL Y SVL LQTI 
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FIG 4 (continued) / 
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1234567890 1234567890 1234567890 1234567890 1234567890 

TCICAAGAAA TAAAT33IGA CATOGAACAG GIGACIGACT CXXTO3ICAC 1100 
SQEI NGD MEQ VTDS LVT 
LKK M V T WNR SLT PWSP 

SRN K W . HGTG H . L PGH 

CTTGCAAGAT CAACTIAACT CtXTEAQCAGC AGTAGTCCTT CAAAATOGAA 1150 
L Q D QLNS Li A A VVL QNRR 
CKI NLT P.QQ . S F KIE 
LARS T . L PSS SSPS KSK 

GAG Cl'l ' lA GA CITOCTAACC GQCAAAAGAG GGGGAACCTG TTTAlTl'l'lA 1200 

ALD LLT AKRG GTC LFL 
EL.T C.P PKE GEPV YF. 
SFR LANR QKR GNL FIFR 

GGAGAAGAAC GC7IGTEATIA TGTIAATCAA TGCAGAATIG TCACTGAGAA 1250 
GEER CYY VNQ SRIV TEK 
EKN AVIM LIN PEL SLRK 
RRT LLL C.SI QNC H.E 

AGTIAAAGAA ATIOGAGATC GAATACAATC TAGAGCAGAG GAGCTICAAA 1300 
VKE IRDR IQC RAE ELQN 
LKK FEI EYNV EQR SFK 
^•S.RN SRS NTM S R G ASK 

ACAQOGAAGG CIGGQGGCTC CTCAGGCAAT GGATQGCCIG GGTICIGGQC 1350 

TER WGL LSQW MPW VLP 
TPNA GAS SAN GCPG FSP 
HRT LGPP QPM DAL GSPL 

T1 1 T 1AQGAC CTCTAGCAGC TC7EAATATIG TTACTOCTICr TIGGAGGCIG 1400 
FLGP LAA LIL LLLF GPC 
S.D L.QL .YC YSS LDPV 
LRT SSS S N" I V TPL WTL 
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FIG 4 



(continued) 
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TATCTTIAAC CIXXTIUI'IA AGITIUIUTC TICCAGAATT GAAGCIOIAA 
IFN LLVK FVS SRI EAVK 
SLT SLL SLSL PEL KL. 
YL . P PC. VCL FQN. SCK 



1450 



AQCTACAGAT G3ICITACAA AT3GAACCCC A 

LQM VLQ MEP 
SYRW SYK WNP 
ATD GLTN GTP 



1481 
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FIG 5 
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TCAAAATOGA AGAGCTTEAG ACTTOCTAAC OGQCAAAAGA GGGGGAAOCT 50 
SKSK SFR LAN RQKR GNL 
QNR RALD LLT AKR GGTC 
KIE EL. TC.P PKE GEP 

Gl ' l ' im ' l ' ri ' i ' AGQGGAAGAA IGL'lUi'ilAGT ATGTTAATCA ATUIGGAATC 100 
FIF RGRM LLV C.S IWNH 
LFL GEE CC.Y VNQ SGI 
VYF. GKN AVS MLIN LES 

ATEACIGAGA AAGTTAAAGA AATTIGAGAT OGAATATAAT GTAGAGCAGA 150 

Y.E S.R NLRS NIM .SR 
ITEK VKE I.D RI .C RAE 
LLR KLKK FEI EYN VEQR 

QGAGCTICAA AACACTGCAC CCTGGQGCCT OCIGAGGCAA TOGATOGOCT 200 
GPSK HCT LGP PQPM DAL 
DLQ NTAP WGL LSQ WMPW 
TFK TLH PGAS SAN GCP 

GGACIUIGOC CTICTEAQGA CCTCEAGCAG CTATAATATT TTTACIGCIC 250 
, DSP LLRT SSS YNI FTPL 

TLP FLG PLAA IIF LLL 
.GLSP S.D L.Q L ..YF YSS 

TTTOGAOQCT GTATCTICAA CTTOJi'lUl'l' AAGTTIGTCT CITCCAGAAT 300 

WTL Y LQ LPC. VCL FQN 
FGPC IFN FLV KFVS SRI 
LDP VSST SLL SLS LPEL 

TGAAGCTGTA AAGCTACAAA TAGl'lUl'lCA AAIGGAAGCC CAGATOCAGT 350 
. SCK ATN SSS NGTP DAV 
EAV KL QI VLQ MEP QMQS 
KL. SYK F F K WNP RCS 
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FIG 5 . (continued) j 

10~ 20 '^30 40 50 

1234567890 1234567890 1234567890 1234567B90 1234567890 

CCATCACTAA AATCTAOOGT QGAQQOCTGG AOQQQQCIGC TAGACTATGC 400 
HD. NLPW TPG PAC T M L 

M T K IYR GPLD RPA RLC 
P.LK STV DPW T G L L DYA 

TCIGATGTTA A1GACATIGA AGTCACCOCT OOOGAGGAAA TCICAACTGC 450 

C. H SHPS RGN LNC 

SDVN DIE VTP P EEI STA 
LML MTLK S P L PRK SQLH 

ACAAOCOCTA CTACACIOCA ATICAGTAGG AAGCAGTIAG AQCAGTIGIC 500 
TTPT TLQ FSR KQLE QLS 
QPL LHSN SVG SS. SSCQ 
NPY YTP IQ.E A V R AVV 

AGCCAAGCTC GCCAACAGTA CTIGGGTTTT GCTGTIGAGA GGGIGGACIG 550 
ANL PNST WVF LLR GWTE 
PTS PTV L.GFS C.E GGL 
SQPP QQY LGF PVER VD. 

AGAGACAQGA CTAGCTGGAT TTGCTAQQCT GACTAAGAAT GGGNAAQQCT 600 

RQD L D FLG. L R I PKP 

RDRT SWI S.A D.ES XSL 
ETG LAGF PRL TKN PXAX 

ANCIQGGAAG GIGAOGGCAT CCATCTITAA ACA1GGGGCT TQCAACTIAG 650 
XWEG DRI H L TWGL QLS 

XGK VTAS IFK H G A CNLA 
LGR .PH PSLN MGL AT. 

CTCACAQQOG AGCAATCAGA GAGCTCACTA AAATGCTAAT CAGGCAAAAA 700 
SH P TNQR AH. NAN QAKT 
HTR PIR ELTK M L I RQK 
LTPD QSE SSL KC.S GKN 
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CAQGAGGTAA. AGCAATAGOC AATCATCTAT TGOCTGAGAG CACAQGQQGA 750 

GGK A I A NHLL PES TAG 
QEVK Q.P I I Y C L R A QRE 
RR. SNSQ SSI A . E HSGK 

AGGACAAQGA TTGGGATATA AACTCAGQCA TTCAAGQCAG CAACAGGAAC 800 
RTRI G I . TQA FKPA TAT 
GQG LGYK LRH SSQ QQQP 
DKD WDI NSGI QAS NSN 

CnXTTTGGG TCCCCTCCCA TIGTATGQGA GCICIU1T1T CACICTATTT 850 
PFG SPPI VWE LCF HSIS 
PLG PLP LYG S SVF TLF 
PLWV PSH CMG ALFS LYF 

CACICEATTA AATCATGGAA OTXACTCTT CTGGTOOGTG TTTTTTATOG 900 

L y imq lhss gpc FLW 

HSIK SCN CTL LVRV FYG 
TLL NHAT ALF WSV FFMA 

CTCAAGCTGA GCl'l'l'lUl'lC GOCATCCAOC ACIGCTGTTT GQCAGOGTCA 950 
LKLS FCS PST TAVC HRH 
SS. AFVR HPP LLF ATVT 
QAE LLF AIHH CCL PPS 

CAGAOCTXXT GCTGACTTOC ATCXXTTTOG ATCCAGCAGA GIGTCCACIG 1000 
RPA ADFH PFG SSR VSTV- 
DPL LTS IPLD PAE CPL 
QTRC .LP SLW IQQS VHC 

TGCKX7IGAT OCAQGGAGGT AOCCATTGQC ACTCOOGATC AQGCTAAAGG 1050 

LLI QRG THCH SRS G.R 
CS.S SEV PIA TPDQ AKG 
APD PARY PLP LPI RLKA 
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CTTGOCATTG TTGCTGCATG GCTAAGTGGC TGGGTTTGTC CIAATAGAAC 1100 
LAIV PAW LSA WVCP NRT 
LPL FLHG .VP GFV LIEL 
CHC SCM AKCL GLS . N 

TGAACACIGG TCACTGGGTT CCATGGTICT- CTIOCATGAC GCAGGGCTTC 1150 
EHW SLGS MVL FHD PRLL 

NTG HWV PWFS SMT HGF 
. TLV TGF HGS LP.P TAS 

TAATAGAGCT ATAACACTCA CO0CNK330C CAAGATIOCA TIDCITGGTA 1200 

IEL .HS PHGP RFH SLV 
. . SY NTH RMA QDSI PWY 
NRA ITLT AWP KIP FL GI 

TCIGTGAGGC CAAGAACGOC AGGTCAGAGA ANGTGAQQCT TGGCACCATT 1250 
SVRP RTP GQR X.GL PPF 
L.G QEPQ VRE X E A CHHL 
CEA KNP RSEX VRL ATI 

TGGGAAGTGG OQCACTGQCA TJ.T1GGTAGC GGOCCAOCAC CATCTTGGGA 1300 
GKW PTAI L V A AHH HLGS 
GSG PLP FW.R PTT ILG 
WEVA HCH FGS GPPP SWE 



GCTGTGGGAG CAAGGATOOC CCAGTAACA 

CGS KDP PVT 
A V G A RIP Q. 
LWE QGS P SN 



1329 
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CCTAGAACCT ATTCTOGAGA ATIGGGACCA ATCTCACACT 0OO3CTAA 50 
PRTY SGE LGP M.HS DAK 
LER I L E N WDQ CDT QTLR 
. N V FWR IGTN VTL RR. 

GAAAGAAACG ATTTATATTC TICIQCAGTA GOGQCTOQGC ACAATATOCT 100 
KET I y! I L LQY R L A TISS 
KKR FIF FCST AWP QYP 
ERND LYS SAV PPGH NIL 

CTICAAGQGA GAGAAACCTG GLT1UC1GM3 GGAAGTATAA ATTATAACAT 150 

SRE RNL AS.G KYK L.H 
LQGR ETW L P E GSIN YNI 
FKG EKPG FLR EV. IITS 

CATCTEACAG CTAGACCTCT TCT3TAGAAA GGAGGGCAAA TGG£G?IGAfiG 200 
HLTA RPL L.K GGQM E.S 
ILQ L D L F CRK EGK WSEV 
SYS .TS SVER RAN GVK 

TOQCATAIGT GCAAaCTTTC TTTTCATEAA GSGACAACTC ACAATEATGT 250 
AIC ANFL FIK RQL TIM. 
PYV QTF FSLR DNS QLC 
CHMC KLS FH. ETTH NYV 

AAAAAGTGTG GTTIATQOOC TACAGGAMC CCTCAGAGTC CPCCTCCCTh 300 

KVW FMP YRKP SES TSL 
KKCG LCP TGS PQSP PPY 
KSV VYAL QEA LRV HLPT 

CDOCAGCGTC COCTGGGGGA ClOjnOCIC AACTAATASG GACOQGCCTT 350 
PQRP LPD S F L> N -G PPF 

PSV PSPT PSS TNK D P PL 
PAS PPR LLPQ LIR TPL 

TAACCCAAaC GGTCCAAA^G GAGATAGACA AAQGGGTAAA CAATCAAOZA 400 
NPN GPKG DRQ RGK Q.TK 
TQT VQK EIDK GVN NEP 
P K R SKR R-T KG-T MNQ 

AAGAGIGGCA ATATICCCCG ATTATGCCCC CTOCAAGCAG TGSGAGGAGG 450 

ECQ YSP IMPP PSS ERR 
KSAN IPR LCP LQAV RGG 
RVP IFPD YAP SKQ .EEE 

AGAATTCOQC CCPOOCPCPG TQCCTGTACC TTITTCTCTC TCAGACTTAA 500 
RIRP SQS ACT FFSL RLK 
EFG PARV PVP FSL SDLK 
NSA QPE CLYL FLS QT. 
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FIG 6 (continued) 
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/O^AAATTAA AATAGACCTA GGTAAATTCT CAGATAAOCC TCAOQGCTAT 550 
AN. NRPR . I L R.P .RLY 
QIK I D L GKFS DNP DGY 
SKLK T V N S QITL T A I 

ATIGATOITT TACAMQGTT AGGACAATOC TITGATCTGA CATOGAGAGA 600 

. CF TRV R T I L . S D MER 
IDVL Q G L GQS FDLT WRD 
LMF Y K G - DNP LI. HGEI 

TATAATCTTA CTACTAAATC AGACACTAAC OOCAAATGAG AGAAGTTCCDG 650 
YNVT TKS DTN PK.E KCR 
I M L LLNQ TLT PNE RSAA 
. CY Y.I RH.P QMR EVP 

CTCTAACIQC AGOQCGAGAG TTTQQCGATC TTTGCTATCT CTGICA33QC 700 
CNC SPRV WRS L V S QSGQ 
VTA ARE FGDL WYL SQA 
L . L Q PES LAI FGIS.VRP 

AACAATAGGA TGACAACAGA GGAAAGAACA ACTCXXACAG QXAGCA3QC 750 

Q.D DNR GKNN SHR PAG 
NNRM TTE ERT TPTG QQA 
TIG . Q Q R KEQ LPQ ASRQ 

MJTTOCCNJT GT3=GAaX7IC ATTO3GACAC AGAATCAGAA CAIGGA3AIT 800 
SSQC RPS LGH RIRT WRL 
VPS VDPH WDT ESE HGDW 
FPV . T L IGTQ NQNMEI 

GGTIOXACAA ACATTTOCIA ACITGOGTIGC TEGAAGGACT GfiGGAAAACT 850 
VPQ TFAN LRA RRT EEN. 
CHK HLL TCVL EGL RKT 
GATN IC. LAC .KD. GKL 

AGGAAGAAGC CTATGAATEA CTCAATCATC TOCACEATAA CftCaGQGAAA 900 

EEA YEL LNDV HYN TG K 
RKKP MNY SMM STIT QGK 
GRS L.IT Q . C PL. HRER 

GGAAGAAAAT CITACT3CTT TICIGGftCAG ACTAAGQGAG GCATIGAGGA . 950 
GRKS YCF SGQ TKGG IEE 
EE N LTAF LDR LRE ALRK 
K K I LLL FWTD . G R H.G 

^GCATAOCIC OCTC7ICAOCT GACTCTAITG AA3GOCAACT AAICITAAAG 1000 
AYL PVT. LY. RPT NLKG 
HTS LSP DSIE GQL ILK 
SIPP CHL TLL KAN. S.R 
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GATAAGTTTA TCACICAGTC AGCTOCAGAC ATTAGAAAAA ACTTCAAAA3 1050 

. vy hsv scrh . k k lqk 
dkfi tqs a a d irkn fks 
i s l slsq lqt l e k tskv 

tctodcttag gqccggagca gaacttagaa aocx7eattta actiogcatc 1100 
salg peq n l e tlfn l as 
lp. arsr t.k pyl t whp 
clr pga elrn pi. lgi 

cicaotitit tataatagag atcaggagga gcagqoc^aa cqggacaaac 1150 
s v f ynrd qee qak rdkr 
qff i i e irrs rrn gtn 
lsfl . .r sgg aget gqt 

gqgataaaaa aaaaagg3gg qgtocactac tttagicaig gcxxtcaggc 1200 

dkk krg gpll . s w psg 
gikk kgg vhy fshg pqa 
g.k kkgg stt lvm alrq 

aagca3acit togaggctct gcaaaaggga aaagctgggc aaatcaaatg 1250 
kqtl eal qkg kagq ikc 
srl wrlc k r e klg ksna 
adf ggs akgk swa nqm 

(xtaatagqg ciggcttoca giggqgtdcta caaqgacact ttaaaaaaga 1300 
lig lass avy kdt lkki 

. . g wlp vrst rtl . k r 
pnra gfq cgl qghf kkd 

teatoc^agt agaaaiaa3c ogcxxxxtto tocatoooa: ti20gicaag 1350 

iqv eis rplv hap yvk 
lsk. k.a apl smpl tsr 
yps rnkp ppc pcp lrqg 

ggaatcacig gamgcxxac toooocagqg gatoaagaea cicigagica 1400 
gitg rpt apg dedt lsq 
esl egpl pqg m k i l.vr 
nhw kah cprg .ry ses 

gaa3ccatta aqcagatgat ocagcagcag gacigagqgt qoooc3qgc3cg 1450 
kpl tr.s ssr teg arge 
s h . pdd paag lrv pga 
eain q m i qqq d.gc pgr 

agogocmoc catcqcatca coctcacaga go0o03ggtca 'lultigacca 1500 

rqp mps psqs pgy v .p 
sasp chh phr apgm fdh 
apa hait lte prv clti 
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TTGPGPGCCA A 1511 
L R A 
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ESQ 



17/27 
FIG 7 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

ATOGGCAGCA GQZATCATCA TCATCATCAC AGCAGCGGQC TOGTOCGQOG 50 
MGSS HHH HHH SSGL VPR 

OGGCAGCCAT AIGQCTAGCA TGACIGGIGG ACAGCAAATG GGTCGGATQC 100 
GSH MASM TGG QQM GRIL 

TAGAAOCTAT TCIGGAGAAT TGGGACCAAT GIGACACTCA GAOGCTAAGA 150 
ERI LEN WDQC DTQ TLR 

AAGAAACGAT TTATATTCIT CTCCAGTACC GOCIGGGCAC AATATGCTCT 200 
KKRF IFF CST AWPQ YPL 

TCAAGGGAGA GAAAQC7IGGC TIGCIGAGGG AAGTATAAAT TATAACATCA 250 
QGR E T W L PEG SIN YNII 

TCITACAGCT AGACCTCTIC TGIAGAAAGG AGGGCAAATG GAGIGAAGIG 300 
L Q L DLF CRKE GKW SEV 

GCATATCIGC AAACTTTCIT TTCATTAAGA GACAACICAC AATTATUTAA 350 
PYVQ TFF SLR D NSQ LCK 

AAAGTGIGGT TTATGCXXTA CAGGAAGCCC TCAGACTOCA CCTOCCTACC 400 
KCG LCPT GSP QSP PPYP 

COO3GIG0C CTXXXTGACT GCTIGCTCAA CTAATAAGGA CXXXXCITTA 450 
SVP SPT PSST NKD PPL 

AGQCAAAOGG TGCAAAAGGA GATAGACAAA GGGGTAAACA ATGAAOCAAA 500 
TQTV QKE IDK GVNN EPK 

GAGIGCCAAT ATIXXXXGAT TATOOGQQCT O^AQCACTG AGAGGAQGAG 550 
SAN IPRL CPL Q A V RGGE 

AATIC3QQGC AGOCAGAGIG CCTCnAQCTT TITCCCICIC AGACITAAAG 600 
F G~ P A R V PVPF SLS DLK 

CAAATTAAAA TAGAGCTAGG TAAATICICA GATAAOQCIG ACGGCTATAT 650 
Q I K I DLG KFS DNPD GYI 

TGATGITITA CAAGGGTTAG GACAATQCTT TGATGIGACA 1GGAGAGATA 700 
DVL QGLG QSF DLT WRDI 

TAATGITACT ACTAAATCAG ACACTAACOC CAAAIG^G AAGIGGCGCT 750 
MLL LNQ TLTP NER SAA 

GTAACIGCAG COOG^GAGTT TGGQGATXZTT TGGTATCTCA GICAGGQCAA 800 
V T A A REF GDL WYLS QAN 
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FIG 7 (continued) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

CAATAGGATG ACAACAGAGG AAAGAACAAC TOCCACAGGC CAGCAGQCAG 850 
NRM TTEE RTT PTG QQAV 

TTOOCAGICT AGAGQCICAT TOGGACACAG AA1CAGAACA TGGAGATTO3 900 
PSV DPH WDTE SEH GDW 

TGQCACAAAC ATTTGCTAAC TIGCXTPGCTA GAAGGACTGA GGAAAACTAG 950 
CHKH LLT CVL EGLR KTR 

GAA2AAGOCT ATGAATTACT CAATCATOTC CACTATAACA CAGGGAAAGG 1000 
KKP MNYS MMS TIT QGKE 

AAGAAAATCT TACTGCTTTT CIGGACAGAC TAAGGGAGGC ATIGAGGAAG 1050 
E N Li TAF LDRL REA LRK 

CATACCTQGC TGICACCTGA CTCTATTGAA GGOCAACTAA TCITAAAGGA 1100 
HTSL SPD SIE GQLI LKD 

TAAGITTATC ACICAGICAG CTGCAGACAT TAGAAAAAAC TTCAAAACTC 1150 
KFI TQSA ADI RKN FKSL 

TGGCTAAGCT TGOGGOOGCA CIGGAGCACC ACCAQC^CCA CCACTGAGAT 1200 
PKL AAA LEHH HHH H . D 

QOGQCIGCTA ACAAAGCOQG AAAQGAAGCT GAGITQGCIN GIGGCMA 1247 
PAAN KAR KEA ELAX G 
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ATGGCTAGCA TGACIGGIGG ACAGCAAATC GGOOQGATQC TAGAAOCTAT 50 
MASM TGG QQM GRIL ERI 

TCIGGAGAAT TQGGACCAAT GTGACACICA GACGCTAAGA AAGAAAOGAT 100 
L E N WDQC DTQ TLR KKRF 

TTATATTCTT CTGCAGTACC QOOTGQCCPJ2 AATATCCTUT TCAAGGGAGA 150 
IFF CS T AWPQ YPL QGR 

GAAACCT3GC TTCCIGAGGG AAGTTATAAAT TATAACATCA TOTACAGCT 200 
ETWL PEG SIN YNII L Q L 

AGACCTCTIC TCTAGAAAGG AGGGCAAATG GACTSAACTG CCATATCTGC 250 
D L F CRKE GKW SEV PYVQ 

AAA LTriClT TICATTAAGA GACAACICAC AATEAIGTAA AAAGIUIGGT 300 
TFF SLR DNSQ L C K KCG 

TTAraCCCTA CAGGAAGQCC TCAGAGICCA (XTCQCTAQC CXIAGQCTQGC 350 
L C P T GSP QSP PPYP SVP 

CIXXXXGACT CCITOCICAA CIAATAAGGA OOCXXXTITTA ACOCAAACX3G 400 
SPT PSST NKD P PL TQTV 

TCCAAAAQGA GATAGACAAA GGGGTAAACA ATCAACCAAA GAGIGCXAAT 450 
Q K E IDK GVNN EPK SAN 

ATICCCCGAT TATGOOQQCT CCAAGCAGIG AGAGGAGGAG AATICGGCOC 500 
IPRL CPL Q A V RGGE FGP 

AGOC^GAGIG OCTXHADCIT TTICICIUIC AGACTTAAAG CAAATEAAAA 550 
A R V PVPF SLS DLK QIKI 

TAGAGCTAGS TAAATICICA GATAACQCIG AGGQCTATAT TGATCHTTTA 600 
DLG KFS DNPD GYI DVL 

CAAGQGTEAG GACAATOC7IT TGATCIGACA TOGAGAGATA TAA3GITACT 650 
QGLG QSF DLT WRDI MLL 

ACTAAATCAG ACACTAAOOC CAAAIGAGAG AAGIGCOGCT GTAACIGCAG 700 
LNQ TLTP NER SAA V T A A 

COCGAGACTT TGQQGATdT TGGTATCICA GICAGQOCAA CAATAGGATG 750 
REF GDL WYLS Q A N NRM 

ACAACAGAGG AAAGAACAAC TQCCACAGGC CAGCAGGCAG TIOXACTGr 800 
TTEE RTT PTG QQAV PSV 
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AGAQOCICAT TQGGACACAG AATCAGAACA TOGAGATTQG TGCTACAAAC 850 
DPH WDTE SEH GDW CHKH 

ATITGCTAAC TTQQGTEX3CTA GAAGGACIGA QGAAAACTAG GAAGAAGOCT 900 
LLT CVL EGLR KTR KKP 

ATGAATTACT CAATGATCTTC CACEATAACA CAGQGAAAQG AAGAAAATCT 950 
MNYS MMS TIT QGKE ENL 

TACTOCTTTT CIGGACAGAC TAAGQGAQGC ATIGAGGAAG CATACCICCC 1000 
TAF LDRL REA LRK HTSL 

TGICACCIGA CICTATTGAA GGCCAACTAA TCTTAAAGGA TAAGTTTATC 1050 
SPD SIE GQLI LKD KFI 

ACICAGICAG CIGCAGACAT TAGAAAAAAC TICAAAAGTC TCCCTAAGCT 1100 
TQSA ADI RKN FKSL PKL 

TGOQGQQGCA CTCGAGCACC ACX^XAOCA CO^CTGAGAT CGGGC7IGCTA 1150 
AAA LEHH HHH H.D PAAN 

ACAAAGCCCG AAAGGAAGCT GAGITGGCIG GIGGCA 1186 
KAR KEA ELAG G 
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TGTCCGCTGT GCICCTGATC CAGCACAGQC GCOCATIGOC TCTOCCAATT 50 
CPLC S.S STG AHCL SQL 
VRC APDP A Q A PIA SPNW 
SAV LLI QHRR PLP L P I 

GGGCTAAAGG CTIGGCATTG TTCCIGCACA GCTAAGIGCC TGGGTTCATC 100 
G.R LAIV PAQ LSA WVHP 
AKG L P L FLHS .VPGFI 
GLKA CHC SCT A K C L GSS 

CTAATGGAGC TCAACACTAG TCACIGQGTT CCAGGGTICT CTTCCATGAC 150 

NRA EH. S L G S TVL FHD 
Li I E L NTS HWV PRFS SMT 
SS T L V TGF HGS LP.P 

GCATGGCTTC TAATAGAGCT ATAACACICA CTQCATQGTC CAAGATICCA 200 
PWLL IEL . H S LHGP RFH 
HGF SY NTH CMV QDSI 

MAS NRA ITLT AWS KIP 

TTCCTTGGAA TCGGIGAGAC CAAGAACCCC AGGTCAGAGA ACACAAQGCT 250 
SLE SVRP RTP GQR TQGL 
PWN P.D QEPQ VRE HKA 
FLGI RET KNP RSEN TRL 

TGCCAGCATG TTGGAAGCAG COCAGCACCA TTTTOGAAGC AGCOQGGCAC 300 

PPC WKQ PTTI LEA ARH 
CHHV GSS PPP FW KQ PAT 
A TM LEAA HHH FGS SPPL 

TATCTIGGGA GCIUIQQGAG CAAGGACCOC AGGTAACAAT TIQGTGACCA 350 
YLGS SGS KDP R.QF GDH 
ILG ALGA RTP GNN LVTT 
SWE LWE QGPQ VTI W . P 

GGAAGQGACC TGAATOOGCA AGCATGAAGG GATCIOCAAA GCAATTGGAA 400 
EGT I R N HEG ISK AIGN 

KGP ESA TMKG SPK QLE 
RRDL NPQ P.R DLQS NWK 
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ATGTTCCTCC CAAG3CAAAA ATGCCCCTAA CATGTATICT GGAGAATIGG 

VPP KAK MPLR CIL ENW 
MFLP RQK CP - DVFW R I G 
CSS QGKN APK MYS GELG 

GACCAATTTG ACCCTCAGAC AGTAAGAAAA AAATGACTTA TATTCTICTG 
DQFD PQT VRK K.LI FFC 
TNL TLRQ . E K NDL YSSA 
PI. PSD SKKK MTY ILL 

CAGTACCGQC CIGGCCACGA TAICCICTTC AAQGQGGAGA AACCTQGCCT 
STA LATI SSS RGR NLAS 
VPP WPR YPLQ G GE TWP 
QYRP GHD ILF KGEK PGL 

CCIGAGGGAA GTATAAATTA TAACACCATC TIACAQCIAG A0CT3TITIG 

GK YKL H H L TAR PVL 

PEGS INY NTI LQLD LFC 
L R E V.II TPS YS. TCFV 

TAGAAAAGGA QGCAAATGGA GTGAAGTGCC ATATTTACAA ACITICITTT 
KRR QME . S A IFTN FLF 
RKG GKWS EVP YLQ TFFS 
EKE ANG VKCH I Y K LSF 

CATTAAAAGA CAACTOGCAA TTATGTTAAC AGTGIGATTT GIGTICCTAC 
IKR QLAI MLT V . F VFLH 
LKD NSQ LC.Q CDL CSY 
H.KT TRN YVN SVIC VPT 

ACX3GAAGCCC TCAGATTCTA CTCCCCACCC CCQGCATCTC CCCIGAATCC 

GSP QIL LPTP GIS PES 
TEAL RFY SPP PASP LNP 
RKP SDST PHP RHL P. IP 

CTCCCCAACT TATT 
L P N L 
S P T Y 
P Q L I 



450 



500 
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-IGTOCGCTGT GCTCCTGATC CAGCACAQGC GQGCATIGQC TCTCGCAATT 50 
CPLC S.S STG AHCL SQL 
VRC APDP AQA P I A SPNW 
SAV LLI QHRR PLP L P I 

qqGCTAAAQG CTTGOCATIG TICCTQCACA GCEAAGTGCC TOQGrTCATC 100 
G R LAIV P A Q LSA WVHP 

AKG L P L FLHS .VP GFI 
GLKA CHC SCT AKCL GSS 

CTAA.TGGAQC TGAACACTAG TCACTGGGTT CCAQGGTTCT CTICCATGAC 150 
NRA EH. SLGS T V L FHD 

LIEL NTS HWV PRFS SMT 
. S S T L V TGF HGS LP.P 

(XATQGCTTC TAATAGAGCT ATAACACIGA CIGCATOGIC CAAGATTCCA 200 
PWLL I E L . H S LHGP RFH 
HGF SY NTH CMV QDSI 

MAS NRA ITLT AWS KIP 

TTCCTIGGAA TCCGIGAGAC CAAGAAGGGC AGGICAGAGA ACACAAGGCT 250 
SL E SVRP RTP GQR TQGL 
PWN P.D QEPQ VRE HKA 
FLGI RET KNP RSEN TRL 

TOQCACCATG TIGGAAGCAG CGCAGCACCA TTTTGGAAGC QGCCCGCCAC 300 

PPC WKQ PTTI LEA ARH 
CHHV GSS PPP FWKR PAT 
ATM LEAA HHH FG~S GPPL 

TATCTTGOGA GCTCIGOGAG CAAGGACCCC CAGGTAACAA TT1GGIGACC 350 
YLGS SGS KDP QVTI W . P 
ILG ALGA R TP R-Q FGDH 
SWE LWE QGPP GNN LVT 



ACGAAGQGAC CTGAATOGQC AAGCATGAAG QGATCTCCAA AGCAATIQGA 
RRD LNPQ P.R DLQ SNWK 
EGT . I R NHEG I S K A I G 
TKGP ESA TMK GSPK QLE 



400 



24/27 

FIG 10 (continued) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

AATGTTCCTC CCAAGQCAAA AATOCCCCTA AGATGTATTC TGGAGAATIG 450 

CSS QGK NAPK MYS GEL 
NVPP KAK MPL RCIL ENW 
MFL PRQK CP. DVF WRIG 

GGACCAATCT GACCCICAGA CAGTAAGAAA AAAAA1GACT TATATTCTIC 500 
GPI. PSD SKK KNDL YSS 
DQS DPQT VRK KMT Y I L L 
TNL TLR Q.EK K.L IFF 

TGCAGTACCG CCTGGCCACG GATATCCTCT TCAAGGGGGA GAAACCTGGC 550 
AVP PGHG Y P L QGG ETWP 
QYR LAT DILF KGE KPG 
CSTA WPR ISS SRGR NLA 

CICCIGAGGG AAGTATAAAT TATAACACCA TCITACAGCT AGACCTGTTT 600 

PEG SIN YNTI LQL DLF 
LLRE V.I ITP SYS. TCF 
S.GKYKL .HHLTARPVL 

TGTAGAAAAG GAGGCAAATG GAGTGAAGTG CCATATITAC AAACTITCTT 650 
CRKG GKW SEV PYLQ TFF 
VEK EANG VKC HIY KLSF 
.KR RQM E.SA IFT NFL 

TICATEAAAA GACAACTCGC AATTATGTAA ACAGTGTGAT TIGTGTCCTA 700 
SLK DNSQ LCK QCD LCPT 
H.K TTR NYVN SVI CVL 
FIKR QLA IM. TV.F VSY 

CAGGAAGCCC TCAGATCTAC CTCCCTACCC CGGCATCTCC CIGACICCTT 750 

GSP QIY LPTP ASP .LL 
QEAL RST SLP RHLP DS.F 
RKP SDLP PYP GIS LTPS 

CCCCAACTAA TAAQGACCCA CITCAGCCCA AACAGTCCAA AAGGACATAG 800 
PQLI RTH FSP NSPK GH 
PN. G P T S A Q TVQ KDI 

PTN KDP LQPK QSK RT. 
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GQCATIGATA GCAGCCATCA GATGGCCAAA TCATTATITA CIQGACCAQG 50 
GIDS THQ MAK SLFT GPG 
A L I APIR WPN HYL L DQA 
H. . HPS DGQI IIY WTR 

C CTJ.T1 CAAA ACTATCAAGC AGATAGGGCC OCTGAAGGAT GCCAAAGAAA 100 
LFK TIKQ IGP VKH AKEI 
FSK LSS R.GP .SM PKK 
PFQN Y Q A D RA REAC QRN 

TAATCCCCIG CCTIATCQCC ATCTTOCTIC AGGAGAACAA AGAACAGGGC 150 

IPC L I A MFLQ ENK EQA 
. SPA LSP CSF RRTK NRP 
NPL PYRH VPS GEQ RTGH 

ATTACGCAGG GGAAGACIGG CAACTAGATT TTACCCACAT GGCGAAATGT 200 
ITQG KTG N.I LPTW PNV 
L p R GRLA TRF YPH GQMS 
YPG EDW QLDF THM AKC 

CAGGGATITC AGCATCIACT AGICIGGQCA GATACTITCA CIGGTIGGGT 250 
RDF SIY. SGQ ILS LVGW 
GIS AST SLGR YFH WLG 
QGFQ HLL VWA D TFT GWV 

GGAGTCTICT CCTIGTAGGA CAGAAAAGAC CCAAGAGGTA ATAAAGGCAC 300 

S L L LVG QKRP KR. .RH 
GVFS L.D RKD PRGN KGT 
ESS PCRT EKT QEV IKAL 

TAATGAAATA ATTCGCAGAT TIGGACTTQC CCCAGGATTA CAGGGTGACA 350 
. NN SQI WTS PRIT G.Q 
N E I IPRF G Li P PGL QGDN 
MK. FPD LDFP QDY RVT 
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AT33CCCCQC TITCAAGGCT QCAGIAAGQC AGGGAGTATC CCAQGIGITA. 400 
WPR FQGC SNP GSI PGVR 
GPA FKA AVTQ GVS QVL 
MAPL SRL Q.P REYP RC. 

GGGATACAAT ATCACTTACA CTC?IGGCIGG AGGCCACAAT (XTCCAGAAA 450 

HTI SLT LCLE ATI LQK 
GIQY HLH CAW RPQS SRK 
AYN ITYT VPG GHN PPEK 

AGTCAAGAAA ATGAATGAAA CACTCAAAGA TCTAAAAAAG CTAACCCAAG 500 
SQEN E.N TQR SKKA NPR 
VKK MNET LKD LKK LTQE 
SRK M K HSKI .KS .PK 

AAAGCCACAT TGCATGACCT GTICT3ITGC CTATAACCTT ACTAAGAATC 550 
NPH CMTC SVA YNL TKNP 
THI A.P VLLP ITL LRI 
KPTL HDL FCC L.PY .ES 

CATAACTATC CCCCAAAAAG CAQGACTTAG CCCATACGAG ATGCTATATG 600 

. LS PKK QDLA HTR CYM 
HNYP PKS RT. PIRD AIW 
ITI PQKA GLS PYE MLYG 

GATGGCCTTT CCTAACCAAT GACCTTGTGC TTGACIGAGA AATGGCCAAC 650 
DGLS .PM TLC LTEK WPT 
MAF PNQ. PCA L R NGQL 

WPF L T N DLVL D.E MAN 

TTAGTTGCAG ACATCACCTC CITAGCCAAA TATCAACAAG TICTTAAAAC 700 
. LQ TSPP .PN INK FLKH 
SCR HHL LSQI STS S.N 
LVAD ITS LAK YQQV LKT 
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ATCACAQGGA AOTIGTOGCC GAGAQGAGQG AAAGGAACEA. TTGCACaCIG 750 

HRE PVP ERRE RNY STL 
ITGN LSP RGG KGTI PPW 
SQG TCPR EEG KEL FHPG 

GIGACAIG 7 58 
V T 
. H 
D M 
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La Sclerose en Plaques (SEP) est une maladie 
demyelinisante du systeme nerveux central (SNC) dont la 
cause complete reste encore inconnue. 

De nombreux travaux ont etaye I'hypothese d'une 
etiologie virale de la maladie, mais aucun des virus 
connus testes ne s'est avere etre 1 ■ agent causal 
recherche: une revue des virus recherches depuis des 
annees dans la SEP a ete faite par E. Norrby et R.T. 
Johnson. 

Recemment, un retrovirus, different des retrovirus 
humains connus a ete isole chez des patients atteints de 
SEP. Les auteurs ont aussi pu montrer que ce retrovirus 
pouvait etre transmis in vitro, que des patients atteints 
de SEP produisaient des anticorps susceptibles de 
reconnaitre des proteines associees a 1* infection des 
cellules leptomeningees par ce retrovirus, et que 
1* expression de ce dernier pouvait etre fortement stimulee 
par les genes immediats-precoces de certains herpesvirus. 

Tous ces resultats plaident en faveur du role dans 
la SEP d'au moins un retrovirus inconnu ou d'un virus 
ayant une activite transcriptase inverse (RT) detectable 
selon la methode publiee par H. Perron et qualifiee 
d' activite "RT de type LM7 " . 

Les travaux de la Demanderesse ont permis 
d'obtenir deux lignees continues de cellules infectees par 
des isolats naturels provenant de deux patients differents 
atteints de SEP, par un procede de culture tel que deer it 
dans le document WO-A-93 20188, dont le contenu est 
incorpore par reference a la presente description. Ces 
deux lignees derivees de cellules de plexus-choroides 
humains, denommees LM7PC et PLI-2 ont ete deposees a 
l'E.C.A.C.C. respectivement le 22 juillet 1992 et le 8 
janvier 1993, sous les numeros 92 072201 et 93 010817, 
conformement aux dispositions du Traite de Budapest. Par 
ailleurs, les isolats viraux possedant une activite RT de 
type LM7 ont egalement ete deposes a l'E.C.A.C.C. sous la 



denomination globale de "souches". La "souche" ou isolat 
heberge par la lignee PLI-2, denommee POL-2 , a ete deposee 
aupres de l^E.C.A.C.C. le 22 juillet 1992 sous le 
n° V92072202. La "souche" ou isolat heberge par la lignee 
LM7PC, denommee MS7PG, a ete deposee aupres de 
l'E.C.A.C.C. le 8 janvier 1993 sous le n° V93010816, 

A partir des cultures et des isolats precites, 
caracterises par des criteres biologiques et 
morphologiques, on s'est ensuite attache a caracteriser le 
materiel nucleique associe aux particules virales 
produites dans ces cultures. 

Les portions de genome deja caracter isees ont ete 
utilisees pour mettre au point des tests de detection 
moleculaire du genome viral et des tests 
immunoserologiques, utilisant les sequences d 1 aminoacides 
codees par les sequences nucleotidiques du genome viral, 
pour detecter la reponse immunitaire dirigee cbntre des 
epitopes associes a l 1 infection et/ou l 1 expression virale. 

Ces outils ont deja permis de confirmer une 
association entre la SEP et 1* expression des sequences 
identif iees dans les brevets cites plus loin . Cependant , 
le systeme viral decouvert par la Demanderesse f 
s'apparente a un systeme retroviral complexe. En effet, 
les sequences retrouvees encapsidees dans les particules 
virales extracellulaires produites par les differentes 
cultures de cellules de patients atteints de SEP, montrent 
clairement qu'il y a co-encapsidation de genomes 
retroviraux apparentes, mais differents du genome 
retroviral "sauvage" qui produit les particules virales 
infectantes. Ce phenomene a ete observe entre des 
retrovirus replicatifs et des retrovirus endogenes 
appartenant a la meme famille, voire meme heterologues . La 
notion de retrovirus endogene est tres importante dans le 
contexte de notre decouverte car, dans le cas de MSRV-1, 
on a observe que des sequences retrovirales endogenes 
comprenant des sequences homologues au genome MSRV-1, 



existent dans 1'ADN humain normal. L f existence d' elements 
retroviraux endogenes (ERV) apparentes a MSRV-l par tout 
ou partie de leur genome, explique le fait que 
1* expression du retrovirus MSRV-l dans les cellules 
humaines puisse interagir avec des sequences endogenes 
proches. Ces interactions sont retrouvees dans le cas de 
retrovirus endogenes pathogenes et/ou infectieux (par 
exemple certaines souches ecotropes du Murine Leukaemia 
virus) , dans le cas de retrovirus exogenes dont la 
sequence nucleotidique peut etre retrouvee partiellement 
ou en totalite, sous forme d"ERVs, dans le genome de 
1' animal hote (ex. virus exogene de la tumeur mammaire de 
la souris transmis par le la it) . Ces interactions 
consistent principalement en (i) une transactivation ou 
co-activation d*ERVs par le retrovirus replicatif, (ii) 
une encapsidation "illegitime" d'ARN apparentes d'ERVS, ou 
d'ERVs -voire d'ARN cellulaires- possedant simplement des 
sequences d ' encapsidation compatibles, dans les particules 
retrovirales produites par 1' expression de la souche 
replicative, parfois transmissibles et parfois avec une 
pathogenicity propre, et (iii) des recombinaisons plus ou 
moins importantes entre les genomes co-encapsides , 
notamment dans les phases de transcription inverse, qui 
conduisent a la formation de genomes hybrides, parfois 
transmissibles et parfois avec une pathogenicity propre. 

Ainsi, (i) differentes sequences apparentees a 
MSRV-l ont ete retrouvees dans les particules virales 
purifiees; (ii) 1" analyse moleculaire des differentes 
regions du genome retroviral MSRV-l doit etre faite en 
analysant systematiquement les sequences co-encapsidees , 
interf erantes et/ou recombinees qui sont generees par 
1' infection et/ou l 1 expression de MSRV-l, de plus, 
certains clones peuvent avoir des parties de sequences 
defectives produites par la replication retrovirale et les 
erreurs de matrice et/ou de transcription de la 
transcriptase inverse; (iii) les families de sequences 



apparentees a une meme region genomique retrovirale sont 
les supports d'une detection diagnostique globale qui peut 
etre optimisee par 1 1 identification de regions invar iables 
parmi les clones exprimes et par 1 ' identification de 
trames de lectures responsables de la production de 
polypeptides antigeniques et/ou pathogenes qui peuvent 
n'etre produits que par une partie, voire un seul, des 
clones exprimes et dans ces conditions, l 1 analyse 
systematique des clones exprimes dans une region d f un gene 
donne permet d'evaluer la frequence de variation et/ou de 
recombinaison du genome MSRV-1 dans cette region et de 
definir les sequences optimales pour les applications, 
notamment diagnostiques ; (iv) la pathologie pirovoquee par 
un retrovirus tel que MSRV-1 peut etre un effet direct de 
son expression et des proteines ou peptides produits de ce 
fait, mais aussi un effet de 1 1 activation, de 
1 1 encapsidation, de la recombinaison de genomes apparentes 
ou heterologues et des proteines ou peptides produits de 
ces faits ; ainsi ces genomes associes a I 1 expression de 
et/ou 1' infection par MSRV-1 sont-ils une partie 
integrante de la pathogenicity potentielle de ce virus et 
done, constituent des supports de detection diagnostique 
et des cibles therapeutiques particulieres . De meme, tout 
agent associe a, ou, co-facteur de ces interactions 
responsables de la pathogenie en cause, tel que MSRV-2 ou 
le facteur gliotoxique deer it dans la demande de brevet 
publiee sous le N° FR-2 716 198, peut participer a 
1 " elaboration d'une strategie globale et tres efficace de 
diagnostic, de pronostic, de suivi therapeutique et/ou de 
therapeutique integree de la SEP notamment, mais aussi de 
toute autre maladie associee aux memes agents. . 

Dans ce contexte, on a fait une decouverte 
parallele dans une autre maladie autoimmune, la 
polyarthrite rhumatoide (PR) , qui a ete decrite dans la 
demande de brevet fran^ais deposee sous le N°95 02960. 
Cette decouverte montre que, en appliquant des approches 
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methodologiques similaires a celles qui furent utilisees 
dans les travaux de la Demanderesse sur la SEP, on a pu 
identifier un retrovirus exprime dans la PR qui partage 
les sequences decrites pour MSRV-l dans la SEP et aussi, 
5 la co-existence d'une sequence associee MSRV-2 egalement 
decrite dans la SEP. En ce qui concerne MSRV-l, les 
sequences detect ees communement dans la SEP et la PR, 
concernent les genes pol et gag. En 1 1 etat actuel des 
connaissances, on peut associer les sequences gag et pol 
10 decrites aux souches MSRV-l exprimees dans ces deux 
maladies . 



La presente demande de brevet a pour objet 
differents resultats, supplementaires par rapport a ceux 
deja proteges par les demandes de brevet frangais : 
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"25 - la demande de brevet WO-97/06260. 

La presente invention concerne tout d'abord un 
materiel nucleique, qui peut consister en un materiel 
retroviral, a l'etat isole ou purifie, pouvant etre 
apprehende ou caracterise de differentes manieres : 

30 - il comprend une sequence nucleotidique choisie 

dans le groupe qui consiste en (i) les sequences 
SEQ ID NO: 112, SEQ ID NO: 114, SEQ ID NO: 117, 

SEQ ID NO: 120, SEQ ID NO: 124, SEQ ID NO: 130, 

SEQ ID NO: 141 et SEQ ID NO: 142 ; (ii) les sequences 

35 complementaires aux sequences (i) ; et (iii) les sequences 
equivalentes aux sequences (i) ou (ii), en particulier les 



sequences presentant pour toute suite de 100 monomeres 
contigus, au moins 50 %, et pref erentiellement au moins 
70 % d'homologie avec respect ivement les sequences (i) ou 
(ii) ; 

- il code pour un polypeptide presentant, pour 
toute suite contigue d 1 au moins 3 0 acides amines, au moins 
50 %, et de preference au moins 70 % d'homologie, avec une 
sequence peptidique choisie dans le groupe qui consiste en 
SEQ ID NO: 113, SEQ ID N°115, SEQ ID NO: 118, 
SEQ ID NO: 121, SEQ ID NO: 135 et SEQ ID NO: 137; 

- son gene pol comprend une sequence nucleotidique 
identique ou equivalente a une sequence choisie dans le 
groupe qui consiste en SEQ ID NO: 112, SEQ ID NO: 124 et 
leurs sequences complementaires ; 

- l'extremite 5' de son gene pol commence au 
nucleotide 1419 de SEQ ID NO: 130; 

- son gene pol code pour un polypeptide 
presentant, pour toute suite contigue d'au moins 3 0 acides 
amines, au moins 50 %, et de preference au moins 70 % 
d'homologie, avec la sequence peptidique SEQ ID NO: 113; 

- 1 ' extremite 3' de son gene gag finit au 
nucleotide 1418 de SEQ ID NO: 130; 

- son gene env comprend une sequence nucleotidique 
identique ou equivalente a une sequence choisie dans le 
groupe qui consiste en SEQ ID NO: 117, et ses sequences 
complementaires ; 

- son gene env comprend une sequence nucleotidique 
qui commence au nucleotide 1 de SEQ ID NO: 117 et finit au 
nucleotide au nucleotide 233 de SEQ ID NO: 114; 

- son gene env code pour un polypeptide 
presentant, pour toute suite contigue d'au moins 3 0 acides 
amines, au moins 50 %, et de preference au moins 70 % 
d'homologie, avec la sequence SEQ ID NO:°118; 

- la region U3R de son LTR 3 ' comprend une 
sequence nucleotidique qui se termine au nucleotide 617 de 
SEQ ID NO: 114; 



- la region RU5 de son LTR 5 1 comprend une 
sequence nucleotidique qui commence au nucleotide 755 de 
SEQ ID NO: 120 et finit au nucleotide 337 de 
SEQ ID NO: 141 ou SEQ ID NO: 142; 

- un materiel nucleique retroviral comprenant une 
sequence qui commence au nucleotide 755 de SEQ ID NO: 120 
et qui se termine au nucleotide 617 de SEQ ID NO: 114; 

- le materiel nucleique retroviral tel que defini 
precedemment est en particulier associe a au moins une 
maladie auto-immune telle que la sclerose en plaques ou la 
polyarthrite rhumatoide. 

L 1 invention concerne aussi un fragment 
nucleotidique qui repond au moins a 1 1 une des definitions 
suivantes : 

- il comprend ou consiste en une sequence 
nucleotidique choisie dans le groupe qui consiste en (i) 
les sequences SEQ ID NO: 112, SEQ ID NO: 114, 
SEQ ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, 
SEQ ID NO: 130, SEQ ID NO: 141 et SEQ ID NO: 142 ; (ii) 
les sequences complementaires des sequences (i) ; et (iii) 
les sequences equivalentes aux sequences (i) ou (ii) , en 
particulier les sequences presentant pour toute suite de 
100 monomeres contigus, au moins 50 %, et 
pref erentiellement au moins 70 % d'homologie avec 
respect ivement les sequences (i) ou (ii) ; 

- il comprend ou consiste en une sequence 
nucleotidique codant pour un polypeptide presentant, pour 
toute suite contigue d ' au moins 30 acides amines, au moins 
50 %, et de preference au moins 70 % d'homologie, avec une 
sequence peptidique choisie dans le groupe qui consiste en 
SEQ ID NO: 113, SEQ ID N°115, SEQ ID NO: 118, 
SEQ ID NO: 121, SEQ ID NO: 135 et SEQ ID NO: 137. 

D'autres objets de la presente invention sont les 
suivants : 

- une sonde nucleique pour la detection d'un 
retrovirus associe a la sclerose en plaques et/ou la 
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polyarthrite rhumatoide, susceptible de s'hybrider 
specif iquement sur tout fragment precedemment defini et 
appartenant au genome dudit retrovirus; elle possede 
avantageusement de 10 a 100 nucleotides, de preference de 
10 a 30 nucleotides; 

- une amorce pour 1 1 amplification par 
polymerisation d'un ARN ou d'un ADN d'un retrovirus 
associe a la sclerose en plaques et/ou la polyarthrite 
rhumatoide, qui comprend une sequence nucleotidique 
identique ou equivalente a au moins une partie de la 
sequence nucleotidique d'un fragment defini precedemment, 
notamment une sequence nucleotidique presentant pour toute 
suite de 10 monomeres contigus, au moins 50 %, de 
preference au moins 7 0 % d'homologie avec au moins ladite 
partie dudit fragment ; de preference la sequence 
nucleotidique d'une amorce de 1' invention est choisie 
parmi SEQ ID NO: 116, SEQ ID NO: 119, SEQ ID NO: 122, 
SEQ ID NO: 123, SEQ ID NO: 126, SEQ ID NO: 127, 
SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 132, et 
SEQ ID NO: 133; 

- un ARN ou ADN, et notamment vecteur de 
replication et/ou d ' expression, comprenant un fragment 
genomique du materiel nucleique ou un fragment defini 
precedemment ; 

- un peptide code par tout cadre de lecture ouvert 
appartenant a un fragment nucleotidique defini 
precedemment, notamment un polypeptide, par exemple 
oligopeptide formant ou comprenant un determinant 
antigenique reconnu par des sera de patients infectes par 
le virus MSRV-1, et/ou chez lesquels le virus MSRV-1 a ete 
reactive; un peptide preferentiel comprend une sequence 
identique, partiellement ou totalement, ou equivalente a 
une sequence choisie parmi SEQ ID NO: 113, SEQ ID N°115, 
SEQ ID NO: 118, SEQ ID NO: 121, SEQ ID NO: 135 et 
SEQ ID NO: 137; 



- une composition diagnostique , prophylactique, ou 
therapeutique, notamment pour inhiber 1' expression d'au 
moins un retrovirus associe a la sclerose en plaques et/ou 
a la polyarthrite rhumatoide, comprenant un fragment 
nucleotidique defini precedemment ; 

- un procede pour detecter un retrovirus associe a 
la sclerose en plaques et/ou a la polyarthrite rhumatoide, 
dans un echantillon biologique, comprenant les etapes 
consistant a mettre en contact un ARN et/ou un ADN presume 
appartenir ou provenant dudit retrovirus, ou leur ARN 
et/ou ADN complementaire, avec une composition comprenant 
un fragment nucleotidique defini ci-dessus. 

Avant de detainer 1* invention, differents termes 
utilises dans la description et les revendications sont a 
present definis : 

- par souche ou isolat, on entend toute fraction 
biologique infectante et/ou pathogene, contenant par 
exemple des virus et/ou des bacteries et/ou des parasites, 
generant un pouvoir pathogene et/ou antigenique, hebergee 
par une culture ou un hote vivant ; a titre d 1 exemple, une 
souche virale selon la definition precedente peut contenir 
un agent co-inf ectant , par exemple un protiste pathogene, 

- le terme "MSRV" utilise dans la presente 
description designe tout agent pathogene et/ou infectant, 
associe a la SEP, notamment une espece virale, les souches 
attenuees de ladite espece virale, ou les particules 
defectives interf erentes ou contenant des genomes co- 
encapsides ou encore des genomes recombines avec une 
partie du genome MSRV-1, derivees de cette espece. II est 
connu que les virus et particulierement les virus 
contenant de 1 1 ARN ont une variability, consecutive 
notamment a des taux relativement eleves de mutation 
spontanee, dont il sera tenu compte ci-apres pour definir 
la notion d 1 equivalence , 

- par virus humain, on entend un virus susceptible 
d'infecter ou d'etre heberge par l'etre humain, 



- compte tenu de toutes les variations et/ou 
recombinaison naturelles ou induites, pouvant etre 
rencontrees dans la pratique de la presente invention, les 
objets de cette derniere, definis ci-dessus et dans les 
revendications, ont ete exprimes en comprenant les 
equivalents ou derives des differents materiels 
biologiques definis ci-apres, notamment des sequences 
homologues nucleotidiques ou peptidiques, 

- le variant d'un virus ou d'un agent pathogene 
et/ou infectant selon 1' invention, comprend au moins un 
antigene reconnu par au moins un anticorps dirige contre 
au moins un antigene correspondant dudit virus et/ou dudit 
agent pathogene et/ou infectant, et/ou un genome dont 
toute partie est detectee par au moins une sonde 
d 'hybridation, et/ou au moins une amorce d 1 amplification 
nucleotidique specifique dudit virus et/ou agent pathogene 
et/ou infectant, dans des conditions d ' hybridation 
determinees bien connues de l'homme de l'art, 

- selon 1' invention, un fragment nucleotidique ou 
un oligonucleotide ou un polynucleotide est un 
enchainement de monomeres, ou un biopolymere, caracterise 
par la sequence inf ormationnelle des acides nucleiques 
naturels, susceptible de s'hybrider a tout autre fragment 
nucleotidique dans des conditions predeterminees , 
1 • enchainement pouvant contenir des monomeres de 
structures chimiques differentes et etre obtenu a partir 
d'une molecule d'acide nucleique naturelle et/ou par 
recombinaison genetique et/ou par synthese chimique ; un 
fragment nucleotidique peut etre identique a un fragment 
genomique du virus MSRV-1 considere par la presente 
invention, notamment un gene de ce dernier, par exemple 
pol ou env dans le cas dudit virus ; 

- ainsi un monomere peut etre un nucleotide 
naturel d'acide nucleique, dont les elements constitutifs 
sont un sucre, un groupement phosphate et une base azotee; 
dans 1 1 ARN le sucre est le ribose, dans I'ADN le sucre est 



le desoxy-2-ribose; selon qu'il s'agit de l'ADN ou l'ARN, 
la base azotee est choisie parmi !• adenine, la guanine, 
l'uracile, la cytosine, la thymine; ou le nucleotide peut 
etre modifie dans l'un au moins des trois elements 
constitutifs ; a titre d»exemple, la modification peut 
intervenir au niveau des bases, generant des bases 
modifiees telles que l'inosine, la methyl-5- 
desoxycytidine, la desoxyuridine , la 

dimethylamino-5-desoxyuridine, la diamino-2 , 6-purine, la 
bromo-5-desoxyuridine et toute autre base modifiee 
favor isant 1 ' hybridation; au niveau du sucre, la 
modification peut consister dans le remplacement d'au 
moins un desoxyribose par un polyamide, et au niveau du 
groupement phosphate, la modification peut consister dans 
son remplacement par des esters, notamment choisis parmi 
les esters de diphosphate, d'alkyl et arylphosphonate et 
de phosphorothioate, 

- par "sequence inf ormationnelle" , on entend toute 
suite ordonnee de monomeres, dont la nature chimique et 
l'ordre dans un sens de reference, constituent ou non une 
information f onctionnelle de meme qualite que celle des 
acides nucleiques naturels, 

- par hybridation, on entend le processus au cours 
duquel, dans des conditions operatoires appropriees, deux 
fragments nucleotidiques , ayant des sequences suffisamment 
complementaires , s'apparient pour former une structure 
complexe, notamment double ou triple, de preference sous 
forme d'helice, 

- une sonde comprend un fragment nucleotidique 
synthetise par voie chimique ou obtenu par digestion ou 
coupure enzymatique d'un fragment nucleotidique plus long, 
comprenant au moins six monomeres, avantageusement de 10 a 
100 monomeres, de preference 10 a 3 0 monomeres, et 
possedant une specif icite d 1 hybridation dans des 
conditions determinees ; de preference, une sonde 
possedant moins de 10 monomeres n'est pas utilisee seule, 
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mais l'est en presence d • autres sondes de taille aussi 
courte ou non ; dans certaines conditions particulieres , 
il peut etre utile d'utiliser des sondes de taille 
super ieure a 100 monomeres ; une sonde peut notamment etre 
utilisee a des fins de diagnostic et il s'agira par 
exemple de sondes de capture et/ou de detection, 

- la sonde de capture peut etre immobilisee sur un 
support solide par tout moyen approprie, c'est-a-dire 
directement ou indirectement , par exemple par covalence ou 
adsorption passive, 

- la sonde de detection peut etre marquee au moyen 
d'un marqueur choisi notamment parmi les isotopes 
radioactifs, des enzymes notamment choisis parmi la 
peroxydase et la phosphatase alcaline et ceux susceptibles 
d'hydrolyser un substrat chromogene, fluorigene ou 
luminescent, des composes chimiques chromophores , des 
composes chromogenes, fluorigenes ou luminescents , des 
analogues de bases nucleot idiques , et la biotine, 

- les sondes utilisees a des fins de diagnostic de 
1 ' invention peuvent etre mises en oeuvre dans toutes les 
techniques d 'hybridation connues, et notamment les 
techniques dites "DOT-BLOT", "SOUTHERN BLOT", "NORTHERN 
BLOT" qui est une technique identique a la technique 
"SOUTHERN BLOT" mais qui utilise de 1 1 ARN comme cible, la 
technique SANDWICH ; avantageusement , on utilise la 
technique SANDWICH dans la presente invention, comprenant 
une sonde de capture specifique et/ou une sonde de 
detection specifique, etant entendu que la sonde de 
capture et la sonde de detection doivent presenter une 
sequence nucleotidique au moins part iellement differente, 

- toute sonde selon la presente invention peut 
s'hybrider in vivo ou in vitro sur 1 1 ARN et/ou sur l'ADN, 
pour bloquer les phenomenes de replication, notamment 
traduction et/ou transcription, et/ou pour degrader ledit 
ADN et/ou ARN, 
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- une amorce est une sonde comprenant au moins six 
monomer es, et avantageusement de 10 a 3 0 monomer es, 
possedant une specif icite d ' hybridation dans des 
conditions determinees, pour 1 ' initiation d'une 
5 polymerisation enzymatique, par exemple dans une technique 
d f amplification telle que la PCR (Polymerase Chain 
Reaction), dans un procede d 1 elongation, tel que le 
sequengage, dans une methode de transcription inverse ou 
analogue, 

10 - deux sequences nucleotidiques ou peptidiques 

sont dites equivalentes ou derivees 1 1 une par rapport a 
1" autre, ou par rapport a une sequence de reference, si 
f onctionnellement les biopolymeres correspondants peuvent 
jouer sensiblement le meme role, sans etre identiques, 
15 vis-a-vis de 1 ' application ou utilisation consideree, ou 
dans la technique dans laquelle elles interviennent ; sont 
notamment equivalentes deux sequences obtenues du fait de 
la variability naturelle, notamment mutation spontanee de 
I'espece a partir de laquelle elles ont ete identifiees, 
20 ou induite, ainsi que deux sequences homologues, 
l'homologie etant definie ci-apres, 

- par "variabilite" , on entend toute modification, 
spontanee ou induite d'une sequence, notamment par 
substitution, et/ou insertion, et/ou deletion de 
* 25 nucleotides et/ou de fragments nucleotidiques, et/ou 
extension et/ou racourcissement de la sequence a l'une au 
moins des extremites; une variabilite non naturelle peut 
resulter des techniques de genie genetique utilisees, par 
exemple du choix des amorces de synthese degenerees ou 
30 non, retenues pour amplifier un acide nucleique; cette 
variabilite peut se traduire par des modifications de 
toute sequence de depart, consideree comme reference, et 
pouvant etre exprimees par un degre d'homologie par 
rapport a ladite sequence de reference, 
35 - l'homologie caracterise le degre d'identite de 

deux fragments nucleotidiques ou peptidiques compares ; 
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elle se mesure par le pourcentage d'identite qui est 
notamment determine par comparaison directe de sequences 
nucleoditiques ou peptidiques, par rapport a des sequences 
nucleotidiques ou peptidiques de reference, 
5 - tout fragment nucleotidique est dit equivalent 

ou derive d'un fragment de reference, s'il presente une 
sequence nucleotidique equivalente a la sequence du 
fragment de reference ; selon la definition precedente, 
sont notamment equivalents a un fragment nucleotidique de 
10 reference : 

(a) tout fragment susceptible de s'hybrider au 
moins partiellement avec le complement du fragment de 
reference, 

(b) tout fragment dont I'alignement avec le 
15 fragment de reference conduit a mettre en evidence des 

bases contigues identiques, en nombre plus important 
qu'avec tout autre fragment provenant d'un autre groupe 
taxonomique, 

(c) tout fragment resultant ou pouvant resulter de 
20 la variability naturelle de I'espece, a partir de laquelle 

il est obtenu, 

(d) tout fragment pouvant resulter des techniques 
de genie genetique appliquees au fragment de reference, 

(e) tout fragment, comportant au moins huit 
25 nucleotides contigus, codant pour un peptide homologue ou 

identique au peptide code par le fragment de reference, 

(f) tout fragment different du fragment de 
reference, par insertion, deletion, substitution d'au 
moins un monomere, extension, ou raccourcissement a l'une 

30 au moins de ses extremites ; par exemple, tout fragment 
correspondant au fragment de reference, flanque a l'une au 
moins de ses extremites par une sequence nucleotidique ne 
codant pas pour un polypeptide, 

- par polypeptide, on entend notamment tout 

35 peptide d'au moins deux acides amines, notamment 
oligopeptide, proteine, extrait, separe, ou 
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substantiellement isole ou synthetise, par 1 ' intervention 
de la main de 1 ' homme , notamment ceux obtenus par synthese 
chimique, ou par expression dans un organisme recombinant, 

- par polypeptide code de maniere partielle par un 
5 fragment nucleotidique , on entend un polypeptide 

presentant au moins trois acides amines codes par au moins 
neuf monomeres contigus compris dans ledit fragment 
nucleotidique , 

- un acide amine est dit analogue a un autre acide 
10 amine, lorsque leur caracteristiques physico-chimiques 

respectives, telles que polarite, hydrophobicite , et/ou 
basicite, et/ou acidite, et/ou neutrality, sont 
sensiblement les memes ; ainsi, une leucine est analogue a 
une isoleucine. 

15 - tout polypeptide est dit equivalent ou derive 

d'un polypeptide de reference, si les polypeptides 
compares ont sensiblement les memes proprietes, et 
notamment les memes proprietes antigeniques , 

immunologiques, enzymologiques et/ou de reconnaissance 

20 moleculaire ; est notamment equivalent a un polypeptide de 
reference : 

(a) tout polypeptide possedant une sequence dont 
au moins un acide amine a ete substitue par un acide amine 
analogue, 

"25 (b) tout polypeptide ayant une sequence peptidique 

equivalente, obtenue par variation naturelle ou induite 
dudit polypeptide de reference, et/ou du fragment 
nucleotidique codant pour ledit polypeptide, 

(c) un mimotope dudit polypeptide de reference, 
30 (d) tout polypeptide dans la sequence duquel un ou 

plusieurs acides amines de la serie L. sont remplaces par 
un acide amine de la serie D, et vice versa, 

(e) tout polypeptide dans la sequence duquel on a 
introduit une modification des chaines laterales des 
35 acides amines, telle que par exemple une acetylation des 



fonctions amines, une carboxylation des fonctions thiol, 
une ester if ication des fonctions carboxyliques , 

(f) tout polypeptide dans la sequence duquel une 
ou des liaisons peptidiques ont ete modifiees, comme par 
exemple les liaisons carba, retro, inverso, retro-inverso, 
reduites, et methylene-oxy , 

(g) tout polypeptide dont au moins un antigene est 
reconnu par anticorps dirige centre un polypeptide de 
reference , 

- le pourcentage d'identite caracter isant 
l'homologie de deux fragments peptidiques compares est 
selon la presente invention d'au moins 50% et de 
preference au moins 70 %. 

Etant donne qu'un virus possedant une activite 
enzymatique transcriptase inverse peut etre genetiquement 
caracterise aussi bien sous forme d'ARN que d'ADN, il sera, 
fait mention aussi bien de l'ADN que de l'ARN viral pour 
caracteriser les sequences relatives a un virus possedant 
une telle activite transcriptase inverse, dit MSRV-l selon 
la presente description. 

Les expressions d 1 ordre utilisees dans la presente 
description et les revendications , telles que "premiere 
sequence nucleotidique" ne sont pas retenues pour exprimer 
un ordre particulier, mais pour definir plus clairement 
1 • invention. 

Par detection d'une substance ou agent, on entend 
ci-apres aussi bien une identification, qu'une 
quantification, ou une separation ou isolement de ladite 
substance ou dudit agent. 

L 1 invention sera mieux comprise a la lecture de la 
description detaillee qui va suivre faite en reference aux 
figures annexees dans lesquelles : 

La Figure 1 represente la structure generale de 
l'ADN proviral et l'ARN genomique de MSRV-l. 

La Figure 2 represente la sequence nucleotidique 
du clone denomme CL6-5 ' (SEQ ID NO: 112) et trois trames 
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de lecture potentielles en acides amines figurant sous la 
sequence nucleotidique . 

La Figure 3 represente la sequence nucleotidique 
du clone denomme CL6-3 ■ (SEQ ID NO: 114) et trois trames 
5 de lecture potentielles en acides amines figurant sous la 
sequence nucleotidique. 

La Figure 4 represente la sequence nucleotidique 
du clone denomme C15 (SEQ ID NO: 117) et trois trames de 
lecture potentielles en acides amines figurant sous la 
10 sequence nucleotidique. 

La Figure 5 represente la sequence nucleotidique 
du clone denomme 5M6 (SEQ ID NO: 12 0) et trois trames de 
lecture potentielles en acides amines figurant sous la 
sequence nucleotidique . 
15 La Figure 6 represente la sequence nucleotidique 

du clone denomme CL2 (SEQ ID NO: 13 0) et trois trames de 
lecture potentielles en acides amines figurant sous la 
sequence nucleotidique. 

La Figure 7 represente trois trames de lecture 
20 potentielles en acides amines exprimees par pET28C-clone 2 
et figurant sous la sequence nucleotidique. 

La Figure 8 represente trois trames de lecture 
potentielles en acides amines exprimees par pET21C-clone 2 
et figurant sous la sequence nucleotidique. 
25 La Figure 9 represente la sequence nucleotidique 

du clone denomme LB13 (SEQ ID NO: 141) et trois trames de 
lecture potentielles en acides amines figurant sous la 
sequence nucleotidique . 

La Figure 10 represente la sequence nucleotidique 
30 du clone denomme LA15 (SEQ ID NO: 142) et trois trames de 
lecture potentielles en acides amines figurant sous la 
sequence nucleotidique . 

La Figure 11 represente la sequence nucleotidique 
du clone denomme LB16 (SEQ ID NO: 12 4) et trois trames de 
35 lecture potentielles en acides amines figurant sous la 
sequence nucleotidique . 
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EXEMPLE 1 : OBTENTION D 1 UNE REGION CL6-5 1 CODANT POUR 
L 1 EXTREMITE N-TERMINALE DE L'INTEGRASE ET D 1 UNE REGION 
CL6-3 • CONTENANT LA SEQUENCE 3' TERMINALE DU GENOME MSRV-1 

Une 3' RACE a ete effectuee sur de 1 1 ARN total 
extrait de plasma d'un patient atteint de SEP. Un plasma 
temoin sain, traite sous les memes conditions, a ete 
utilise comme controle negatif. La synthese de cDNA a ete 
realisee avec une amorce oligo dT identifiee par 
SEQ ID NO: 68 (5 1 GAC TCG CTG CAG ATC GAT TTT TTT TTT TTT 
TTT T 3') et la transcriptase inverse "Expand™ RT" de 
Boehringer selon les conditions preconisees par la 
societe. Une PCR a ete effectuee avec l 1 enzyme Klentaq 
(Clontech) sous les conditions suivantes : 94 °C 5 min puis 
93 °C 1 min, 58°C 1 min, 68 °C 3 min pendant 40 cycles et 
68 °C pendant 8 min, avec un volume reactionnel final de 
50 /xl. 

Amorces utilisees pour la PCR: 

- amorce 5 1 , identifiee par SEQ ID NO: 69 

5 ' GCC ATC AAG CCA CCC AAG AAC TCT TAA CTT 3 1 ; 

- amorce 3', identifiee par SEQ ID NO: 68 

Une deuxieme PCR dite "semi-nichee" a ete 
realisee avec une amorce 5' situee a 1 1 interieur de la 
region deja amplifiee. Cette deuxieme PCR a ete effectuee 
sous les memes conditions experimentales que celles 
utilisees lors de la premiere PCR, en utilisant 10 /il du 
produit d' amplification issu de la premiere PCR. 
Amorces utilisees pour la PCR semi-nichee: 

- amorce 5', identifiee par SEQ ID NO: 70 

5 ' CCA ATA GCC AGA CCA TTA TAT ACA CTA ATT 3 » ; 

- amorce 3', identifiee par SEQ ID NO: 68 

Les amorces SEQ ID NO: 69 et SEQ ID NO: 7 0 sont 
specif iques de la region pol de MSRV-1. 

Un produit d 1 amplification de 1,9Kb a ete obtenu 
pour le plasma de patient SEP. Le fragment correspondant 
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n'a pas ete observe pour le plasma temoin sain. Ce produit 
d 9 amplification a ete clone de la faqzon suivante : 
l'ADN amplifie a ete insere dans un plasmide a I'aide du 
kit TA Cloning®. Les 2 /xl de solution d 9 ADN ont ete 
5 melanges avec 5 jul d'eau distillee sterile, l/xl d'un 
tampon de ligation 10 fois concentre " 10X LIGATION 
BUFFER", 2 fil de "pCR™ VECTOR" (2 5 ng/ml) et 1 y,l de "T4 
DNA LIGASE" . Ce melange a ete incube une nuit a 12 °C. Les 
etapes suivantes ont ete realisees conformement aux 

10 instructions du kit TA Cloning® (Invitrogen) . A la f in de 
la procedure, les colonies blanches de bacteries 
recombinantes (white) ont ete repiquees pour etre 
cultivees et permettre 1' extraction des plasmides 
incorpores selon la procedure dite de "miniprep" . La 

15 preparation de plasmide de chaque colonie recombinante a 
ete coupee par une enzyme de restriction appropriee et 
analysee sur gel d 1 agarose. Les plasmides possedant un 
insert detecte sous lumiere UV apres marquage du gel au 
bromure d'ethidium, ont ete selectionnes pour le 

20 sequengage de 1' insert, apres hybridation avec une amorce 
complementaire du promoteur Sp6 present sur le plasmide de 
clonage du TA cloning kit®. La reaction prealable au 
sequen?age a ensuite ete effectuee selon la methode 
preconisee pour 1 • utilisation du kit de sequengage 

25 "PRISM™ Ready Reaction AmpliTaq® FS, DyeDeoxy™ 
Terminator" (Applied Biosystems, ref. 402119) et le 
sequengage automatique a ete realise sur les appareils 
373 A et 377 Applied Biosystems, selon les instructions du 
f abricant . 

30 Le clone obtenu, contient une region CLG-S'codant 

pour l'extremite N terminale de 1 1 integrase et une region 
CL6-3 1 , correspondant a la region 3* terminale de MSRV-1 
et permettant de definir la fin de l'enveloppe (234 pb) et 
les regions U3 , R (401 pb) du retrovirus MSRV1. 

35 La region correspondant a l'extremite N terminale 

de 1 9 integrase est representee par sa sequence 
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nucleotidique (SEQ ID NO: 112) dans la figure l. Les trois 
trames de lecture potent iel les sont presentees par leur 
sequence aminoacide sous la sequence nucleotidique, et la 
sequence aminoacide de l'extremite N-terminale de 
5 1 • integrase est identifiee par SEQID NO: 113. 

La region C16-3 1 est representee par sa sequence 
nucleotidique (SEQ ID NO: 114) dans la figure 3. Les trois 
trames de lecture potentielles sont presentees par leur 
sequence aminoacide sous la sequence nucleotidique. Une 
10 sequence aminoacide correspondant a l'extremite 
C-terminale de la proteine env de MSRV-1 est identifiee 
par SEQ ID NO: 115. 

EXEMPLE 2 : OBTENTION DU CLONE CIS CONTENANT LA REGION 
15 CODANT POUR UNE PARTI E DE L ' ENVELOPPE DU RETROVIRUS MSRV-1 

Une RT-PCR a ete effectuee sur de l'ARN total 
extrait de virions concentres par ultracentrif ugation a 
partir d'un surnageant de culture de synoviocytes 

2 0 provenant d'un patient PR. La synthese de cDNA a ete 

realisee avec une amorce oligo dT et la transcriptase 
inverse "Expand™ RT" de Boehringer selon les conditions 
preconisees par la societe. Une PCR a ete effectuee avec 
l 1 Expand™ Long Template PCR System (Boehringer) sous les 
25 conditions suivantes : 94°C 5 min puis 93 °C 1 min, 60°C 1 
min, 68 °C 3 min pendant 40 cycles et 68 °C pendant 8 min et 
avec un volume reactionnel final de 50 /xl. 
Amorces utilisees pour la PCR: 

- amorce 5 1 , identifiee par SEQ ID NO: 69 

3 0 5' GCC ATC AAG CCA CCC AAG AAC TCT TAA CTT 3 1 ; 

- amorce 3', identifiee par SEQ ID NO: 116 

5 ' TGG GGT TCC ATT TGT AAG ACC ATC TGT AGC TT 3 1 

Une deuxieme PCR dite "semi-nichee" a ete 
realisee avec une amorce 5' situee a 1 ' interieur de la 
35 region deja amplifiee. Cette deuxieme PCR a ete effectuee 
sous les memes conditions exper imentales que celles 



utilisees lors de la premiere PCR (sauf que 30 cycles ont 
ete realises au lieu de 40) , en utilisant 10 /xl du produit 
d 1 amplification issu de la premiere PCR, 
Amorces utilisees pour la PCR semi-nichee: 

- amorce 5', identifiee par SEQ ID NO: 70 

5 ' CCA ATA GCC AGA CCA TTA TAT ACA CTA ATT 3 ' ; 

- amorce 3 1 , identifiee par SEQ ID NO: 116 

Les amorces SEQ ID NO: 69 et SEQ ID NO: 70 sont 
specif iques de la region pol de MSRV-1. L 1 amorce SEQ ID 
NO: 116 est specif ique de la sequence FBdl3 (aussi denomme 
B13) et est local isee dans la region env conservee parmi 
les oncoretrovirus . 

Un produit d 1 amplification de 1932 pb a ete 
obtenu et clone de la fac?on suivante : 

1 1 ADN amplifie a ete insere dans un plasmide a l'aide du 
kit TA Cloning®. Les differentes etapes ont ete realisees 
conformement aux instructions du kit TA Cloning® 
(Invitrogen) . A la fin de la procedure, les colonies 
blanches de bacteries recombinantes (white) ont ete 
repiquees pour etre cultivees et permettre 1' extraction 
des plasmides incorpores selon la procedure dite de 
"miniprep" . La preparation de plasmide de chaque colonie 
recombinante a ete coupee par une enzyme de restriction 
appropriee et analysee sur gel d' agarose. Les plasmides 
possedant un insert detecte sous lumiere UV apres marquage 
du gel au bromure d'ethidium, ont ete selectionnes pour le 
sequengage de 1* insert, apres hybridation avec une amorce 
complementaire du promoteur SP6 present sur le plasmide de 
clonage du TA cloning kit®. La reaction prealable au 
sequen<?age a ensuite ete effectuee selon la methode 
preconisee pour 1 1 utilisation du kit de sequen<?age 
"PRISM™ Ready Reaction AmpliTaq® FS, DyeDeoxy™ 
Terminator" (Applied Biosystems, ref. 402119) et le 
sequengage automatique a ete realise sur les appareils 
373 A et 377 Applied Biosystems, selon les instructions du 
f abricant . 



Le clone C15 obtenu, contient une region 
correspondant a la region de l'enveloppe de MSRV-1, de 
1481 pb. 

La region env du clone C15 est representee par sa 
sequence nucleotidique (SEQ ID NO: 117) dans la figure 5. 
Les trois trames de lecture potentielles de ce clone sont 
presentees par leur sequence aminoacide sous la sequence 
nucleotidique. La trame de lecture correspondant a une 
proteine env structurale MSRV-1 est identifiee par 
SEQ ID NO: 118. 



EXEMPLE 3 : OBTENTION D 1 UN CLONE 5M6 CONTENANT LES 
SEQUENCES DE LA REGION 3' TERMINALE DE L'ENVELOPPE, 
SUIVIES DES SEQUENCES U3,R,U5 DE TYPE PROVIRAL MSRV-1. 

Une PCR monodirectionnelle a ete effectuee sur de 
l'ADN extrait de lymphocytes B immortalises en culture 
d'un patient PR. La PCR a ete effectuee avec 1 • Expand™ 
Long Template PCR System (Boehringer) sous les conditions 
suivantes : 94°C 3 min puis 93°C 1 min, 60°C 1 min, 68°C 3 
min pendant 10 cycles , puis 93 °C 1 min, 60°C 1 min avec 
15 sec d' extension a chaque cycle, 68 °C 3 min pendant 35 
cycles et 68 °C pendant 7 min et avec un volume reactionnel 
final de 50 

L 1 amorce utilisee pour la PCR identifiee par 
SEQ ID NO: 119 est 5' TCA AAA TCG AAG AGC TTT AGA CTT GCT 
AAC CG 3 1 ; 

L 1 amorces SEQ ID NO: 119 est specif ique de la region env 
du clone C15. 

Un produit d • amplification de 1673 pb a ete 
obtenu et clone de la fagon suivante : 

l'ADN amplifie a ete insere dans un plasmide a l'aide du 
kit TA Cloning®, Les differentes etapes ont ete realisees 
conformement aux instructions du kit TA Cloning® 
(Invitrogen) . A la fin de la procedure, les colonies 
blanches de bacteries recombinantes (white) ont ete 



repiquees pour etre cultivees et permettre l 1 extraction 
des plasmides incorpores selon la procedure dite de 
"miniprep" . La preparation de plasmide de chaque colonie 
recombinante a ete coupee par une enzyme de restriction 
appropriee et analysee sur gel d' agarose. Les plasmides 
possedant un insert detecte sous lumiere UV apres marquage 
du gel au bromure d'ethidium, ont ete selectionnes pour le 
sequengage de 1' insert, apres hybridation avec une amorce 
complementaire du promoteur T7 present sur le plasmide de 
clonage du TA cloning kit®. La reaction prealable au 
sequen<?age a ensuite ete effectuee selon la methode 
preconisee pour 1 1 utilisation du kit de sequengage 
"PRISM™ Ready Reaction AmpliTaq® FS, DyeDeoxy™ 
Terminator" (Applied Biosystems, ref. 402119) et le 
sequen?age automatique a ete realise sur les appareils 
373 A et 377 Applied Biosystems, selon les instructions du 
f abricant . 

Le clone 5M6 obtenu, contient une region 
correspondant a la region 3' de l'enveloppe de MSRV-1, de 
492 pb suivi des regions U3 , R et U5 (837 pb) de MSRV1. 

Le clone 5M6 est represents par sa sequence 
nucleotidique (SEQ ID NO: 120) dans la figure 7. Les trois 
trames de lecture potentielles de ce clone sont presentees 
par leur sequence aminoacide sous la sequence 
nucleotidique. La trame de lecture correspondant a 
l'extremite C-terminale de la proteine env MSRV-1 est 
identifiee par SEQ ID NO: 121, 

EXEMPLE 4 : OBTENTION DU CLONE LB16 CONTENANT LA REGION 
CODANT L ' INTEGRASE DU RETROVIRUS MSRV-1 

Une RT-PCR a ete effectuee sur de 1 ' ARN total 
traite a la DNAsel et extrait a partir d'un plexus 
choroide provenant d'un patient SEP. La synthese de cDNA a 
ete realisee avec une amorce oligo dT et la transcriptase 
inverse "Expand™ RT" de Boehringer selon les conditions 
preconisees par la societe. Un controle "no RT" a ete 



effectue parallelement sur le meme materiel, Une PCR a ete 
effectuee avec la Taq polymerase (Perkin Elmer) sous les 
conditions suivantes : 95°C 5 min puis 95 °C 1 min, 55 °C 1 
min, 72 °C 2 min pendant 35 cycles et 72 °C pendant 8 min et 
avec un volume reactionnel final de 50 /xl. 
Amorces utilisees pour la PCR: 

- amorce 5', identifiee par SEQ ID NO: 122 
5 • GGC ATT GAT AGC ACC CAT CAG 3 1 ; 

- amorce 3», identifiee par SEQ ID NO: 12 3 
5 1 CAT GTC ACC AGG GTG GAA TAG 3 ' 

L' amorce SEQ ID NO: 122 est specif ique de la 
region pol de MSRV-1 et plus precisement similaire a la 
region integrase decrite precedement. L • amorce 

SEQ ID NO 123 a ete definie sur des sequences des clones 
obtenus lors d'essais prealables. 

Un produit d • amplif ication d 1 environ 760 pb a ete 
obtenu uniquement dans l'essai avec RT et a ete clone de 
la facpon suivante : 

l'ADN amplif ie a ete insere dans un plasmide a l'aide du 
kit TA Cloning®. Les differentes etapes ont ete realisees 
conformement aux instructions du kit TA Cloning® 
(Invitrogen) . A la fin de la procedure, les colonies 
blanches de bacteries recombinantes (white) ont ete 
repiquees pour etre cultivees et permettre 1* extraction 
des plasmides incorpores selon la procedure dite de 
"miniprep" . La preparation de plasmide de chaque colonie 
recombinante a ete coupee par une enzyme de restriction 
appropriee et analysee sur gel d 1 agarose. Les plasmides 
possedant un insert detecte sous lumiere UV apres marquage 
du gel au bromure d'ethidium, ont ete selectionnes pour le 
sequengage de 1' insert, apres hybridation avec une amorce 
complementaire du promoteur T7 present sur le plasmide de 
clonage du TA cloning kit®. La reaction prealable au 
sequengage a ensuite ete effectuee selon la methode 
preconisee pour 1 1 utilisation du kit de sequen<?age 
"PRISM™ Ready Reaction AmpliTaq® FS, DyeDeoxy™ 



Terminator" (Applied Biosystems, ref. 402119) et le 
sequengage automatique a ete realise sur les appareils 
373 A et 377 Applied Biosystems, selon les instructions du 
fabricant. 

Le clone LB16 obtenu, contient les sequences 
correspondant a l'integrase. La sequence nucleotidique de 
ce clone est identifiee par SEQ ID NO: 124 sur la figure 
11, trois trames de lecture sont determinees. 

EXEMPLE 5 : OBTENTION D 1 UN CLONE 2, CL2 , CONTENANT EN 

3' UNE PART IE HOMOLOGUE AU GENE POL, CORRESPONDANT AU GENE 
PROTEASE, ET AU GENE GAG (GM3) CORRESPONDANT A LA 
NUCLEOCAPSIDE, ET UNE NOUVELLE REGION 5 1 CODANTE , 
CORRESPONDANT AU GENE GAG PLUS S P E C I F I QUEMENT LA MATRICE 
ET LA CAPSIDE de MSRV-1. 

Une amplification par PCR a ete effectuee sur de 
1 1 ARN total extrait a partir de 100 /xl de plasma d f un 
patient atteint de SEP. Un temoin eau, traite sous les 
memes conditions, a ete utilise comme controle negatif. La 
synthese de cDNA a ete realisee avec 3 00 pmole d'une 
amorce aleatoire (GIBCO-BRL, France) et la transcriptase 
inverse "Expand RT" (BOEHRINGER MANNHEIM, France) selon 
les conditions preconisees par la societe. Une 
amplification par PCR (" polymerase chain reaction w ) a 
ete effectuee avec 1" enzyme Tag polymerase (Perkin Elmer, 
France) en utilisant 10 /xl de cDNA sous les conditions 
suivantes: 94°C 2 min, 55°C 1 min, 72°C 2 min puis 94°C 
1 min, 55°C 1 min, 72°C 2 min pendant 30 cycles et 72°C 
pendant 7 min et avec un volume reactionnel final de 
50 /xl. 

Amorces utilisees pour 1 1 amplification par PCR: 

- amorce 5', identifiee par SEQ ID NO: 126 
5 1 CGG ACA TCC AAA GTG ATG GGA AAC G 3 1 ; 

- amorce 3 1 , identifiee par SEQ ID NO: 127 
5 » GGA CAG GAA AGT AAG ACT GAG AAG GC 3 1 



Une deuxieme amplification par PCR dite "semi- 
nichee" a ete realisee avec une amorce 5' situee a 
l'interieur de la region deja amplif iee. Cette deuxieme 
PCR a ete effectuee sous les memes conditions 
experimentales que celles utilisees lors de la premiere 
PCR, en utilisant 10 fil du produit d • amplification issu de 
la premiere PCR, 

Amorces utilisees pour 1 1 amplification par PCR semi- 
nichee: 

- amorce 5', identifiee par SEQ ID NO: 128 
5 • CCT AGA ACG TAT TCT GGA GAA TTG GG 3 1 ; 

- amorce 3', identifiee par SEQ ID NO: 129 
5 1 TGG CTC TCA ATG GTC AAA CAT ACC CG 3 ■ 

Les amorces SEQ ID NO: et SEQ ID NO: sont 
specif iques de la region pol, clone G+E+A, plus 
specif iquement la region E: position nucleotidique n° 423 
a n° 448. Les amorces utilisees dans la region 5 1 ont ete 
definies sur des sequences de clones obtenus lors d'essais 
prealables . 

Un produit d 1 amplif ication de 1511 pb a ete 
obtenu a partir de 1 1 ARN extrait du plasma de patient SEP. 
Le fragment correspondant n'a pas ete observe pour le 
temoin eau. Ce produit d ' amplif ication a ete clone de la 
fa?on suivante. 

L'ADN amplif ie a ete insere dans un plasmide a 
l'aide du kit TA Cloning™. Les 2 /xl de solution d'ADN ont 
ete melanges avec 5 Ml d 1 eau distillee sterile, 1 /il d'un 
tampon de ligation 10 fois concentre "10X LIGATION 
BUFFER", 2 Ml de "pCR™ VECTOR" (25 ng/ml) et 1 Ml de "T4 
DNA LIGASE" . Ce melange a ete incube une nuit a 14 °C. Les 
etapes suivantes ont ete realisees conformement au 
instructions du kit TA Cloning® (Invitrogen) . Le melange a 
ete etale apres transformation de la ligation dans des 
bacteries E. coll INVotF 1 . A la fin de la procedure, les 
colonies blanches de bacteries recombinantes ont ete 
repiquees pour etre cultivees et permettre 1' extraction 



des plasmides incorpores selon la procedure dite de 
"minipreparation d'ADN" (17). La preparation de plasmide 
de chaque colonie recombinante a ete coupee par 1' enzyme 
de restriction Eco RI et analysee sur gel d' agarose. Les 
plasmides possedant un insert detecte sous lumiere UV 
apres marquage du gel au bromure d'ethidium, ont ete 
selectionnes pour le sequen?age de 1' insert, apres 
hybridation avec une amorce complementaire du promoteur T7 
present sur le plasmide de clonage du TA cloning kit®. La 
reaction prealable au sequengage a ensuite ete effectuee 
selon la methode preconisee pour 1 1 utilisation du kit de 

TM 

sequen<?age "PRISM Ready Reaction Amplitaq® FS, 

TM 

DyeDeoxy Terminator" (Applied Biosystems, ref. 402119) 
et le sequengage automatique a ete realise sur les 
appareils 373 A et 377 Applied Biosystems, selon les 
instructions du fabricant. 

Le clone obtenu, denomme CL2 , contient une region 
C-terminale similaire a la region 5' terminale des clones 
G+E+A de MSRV-1, qui permet de definir la region C- 
terminale du gene gag et une nouvelle region 
correspondante a la region N-terminale du gene gag MSRV-1. 

CL2 permet de definir une region de 1511 pb 
presentant une phase ouverte de lecture dans la region N- 
terminale de 1077 pb codante pour 359 acides amines et une 
phase non-ouverte de lecture, de 4 54 pb, correspondant & 
la region C-terminale du gene gag MSRV-1. 

La sequence nucleotidique de CL2 est identifiee 
par SEQ ID NO: 130. Elle est representee a la figure 
XX3,!, avec les trames de lecture potentielles en 
aminoacide . 

Le fragment de 1077 pb de CL2 codant pour 359 
acides amines a ete amplifie par PGR avec 1' enzyme Pwo 
(5U//xl) (Boehringer Mannheim, France) en utilisant 1 /il de 
la minipreparation de l'ADN du clone 2 sous les conditions 
suivantes : 95°C 1 min, 60°C 1 min, 72°C 2 min pendant 25 
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cycles et avec un volume reactionnel final de 50 ill a 
l'aide des amorces: 

- amorce 5 ' (Bam HI), identifie par SEQ ID NO: 132 

5 1 TGC TGG AAT TCG GGA TCC TAG AAC GTA TTC 3' (30 mer) , et 

- amorce 3 1 (Hind III), identifie par SEQ ID NO: 133 
5 AGT TCT GCT CCG AAG CTT AGG CAG ACT TTT 3' (30 mer) 
correspondant , respect ivement , a la sequence nucleotidique 
du clone 2 en position -9 a 21 et 1066 a 1095. 

Le fragment obtenu apres PCR, a ete linearise par 
Bam HI et Hindlll et sous-clone dans les vecteurs 
d' expression pET28C et pET21C (N0VAGEN) linearise par Bam 
HI et Hind III. Le sequengage de 1 1 ADN du fragment de 1077 
pb du clone 2 dans les deux vecteurs d 1 expression a ete 
realise selon la methode preconisee pour 1 ' utilisation du 

• TM 
kit de sequen?age "PRISM Ready Reaction Amplitaq® FS, 

TM 

DyeDeoxy Terminator" (Applied Biosystems, ref . 402119) 
et le sequengage automatique a ete realise sur les 
appareils 373 A et 377 Applied Biosystems, selon les 
instructions du fabricant. 

L' expression de la sequence nucleotidique du 
fragment de 1077 pb du clone 2 par les vecteurs 
d 1 expression pET28C et pET21C sont identifiees par 
respectivement SEQ ID NO: 13 5 et SEQ ID NO: 137. 

EXEMPLE 6: EXPRESSION DU CLONE 2 CHEZ ESCHERICHIA COLI 

Les constructions pET28c-clone 2 (1077 pb) et 
pET21C-clone 2 (1077 pb) synthetisent , dans la souche 
bacterienne BL21 (DE3), une proteine en fusion N- et C- 
terminale pour le vecteur pET28C et C-Terminale pour le 
vecteur pET21C avec 6 Histidines, de masse moleculaire 
apparente d • environ 4 5 kDa, mise en evidence par 
electrophorese sur gel de polyacry lamide SDS-PAGE (SDS = 
Dodecyl Sulfate de Sodium) (Laemmli, 1970 (1))- La 
reactivite de la proteine a ete mise en evidence vis a vis 



d'un anticorps monoclonal anti-Histidine (DIANOVA) par la 
technique de Western blot (Towbin et al., 1979 (2)). 

Les proteines recombinantes pET2 8c-clone 2 
(1077 pb) et pET21C-clone 2 (1077 pb) ont ete visualisees 
en SDS-PAGE dans la fraction insoluble apres digestion 
enzymatique des extraits bacteriens avec 50 /xl de lysozyme 
(10 mg/ml) et lyse par ultrasons. 

Les proprietes antigeniques des antigenes 
recombinants pET28C-clone 2 (1077 pb) et pET21C-clone 2 
(1077 pb) ont ete testees par Westen Blot () apres 
solubilisation du culot bacterien avec 2% SDS et 50 mM p- 
mercaptoethanol . Apres incubation avec les serums de 
patients atteints de sclerose en plaques, les serums des 
temoins neurologiques et les serums de temoins de centre 
de transfusion sanguine (CTS) , les immunocomplexes ont ete 
detectes a l'aide d'un serum de chevre anti-IgG et anti 
IgM humaines, couple a la phosphatase alcaline. 

Les resultats sont presentes dans le tableau ci- 

apres . 

TABLEAU 

Reactivite de serums atteints de sclerose en plaques et 
temoins avec la proteine recombinante MSRV-1 gag clone 2 
(1077 pb) = pET21C-clone 2 (1077 pb) et pET28C-clone 2 

(1077 pb) a 

MALADIE NOMBRE NOMBRE 

D'INDIVIDUS TESTES D ' INDI VIDUS POSITIFS 

SEP 15 6 

2 (+++), 2(++) / 2(+) 

TEMOINS 

NEUROLOGIQUES 2 !(+++) 

TEMOINS 

SAINS (CTS) 22 !(+/_) 



(a) Les bandelettes contenant 1,5 /xg d'antigene 
recombinant pET-gag clone 2 (1077 pb) presentent une 



reactivite contre de serums dilues au 1/100. 
L» interpretation de Western Blot est basee sur la presence 
ou absence d'une bande pET-gag clone 2 (1077 pb) 
specif ique sur les bandelettes. Des controles positifs et 
negatifs sont inclus dans chaque experience. 

Ces resultats montrent que, dans les conditions 
techniques utilisees, environ 4 0% des serums humains 
atteints de sclerose en plaques testes reagissent avec les 
proteines recombinantes pET28C-clone 2 (1077 pb) et 
pET21C-clone 2 (1077 pb) . Une reactivite a ete observee 
sur un temoin neurologique et il est interessant de noter 
que les ARN extraits a partir de ce serum, apres l'etape 
de transcriptase inverse, sont aussi amplifies par PCR 
dans la region pol. Ceci suggere que des personnes n'ayant 
pas declare une SEP peuvent egalement heberger et exprimer 
ce virus. Par contre, un temoin (donneur CTS) apparemment 
sain, possede des anticorps anti-gag (clone 2, 1077 pb) . 
Ce qui est compatible avec un immunite acquise contre 
MSRV-1 en dehors d'une maladie autoimmune associee 
declaree. 

EXEMPLE 7: OBTENTION D 1 UN CLONE LB13 CONTENANT EN 3' 

UNE PART IE HOMOLOGUE AU CLONE 2 CORRESPONDANT AU GENE GAG 
ET EN 5' UNE PARTI E HOMOLOGUE AU CLONE 5M6 CORRESPONDANT A 
LA RfeGION LTR U5 . 

Une RT-PCR (" reverse-transcr iptase-polymerase 
chain reaction ") a ete effectuee a partir de 1 'ARN total 
extrait de virions, provenant de surnageants de cellules 
lymphocytaires B des patients atteints de sclerose en 
plaques, concentres par ultracentrif ugations . La synthese 
de cDNA a ete realisee avec une amorce specifique SEQ N° 
XXX et la transcriptase inverse " Expand™ RT " de 
BOEHRINGER MANNHEIM selon les conditions preconisees par 
la societe. 



Amorce utilisee pour la synthese du cDNA, identifiee par 
SEQ ID NO: 138: 

5 1 CTT GGA GGG TGC ATA ACC AGG GAA T 3 1 

Une amplification par PCR a ete realisee avec la 
Tag polymerase (Perkin Elmer, France) sous les conditions 
suivantes: 94°C 1 min, 55°C 1 min, 72 °C 2 min pendant 35 
cycles et 72 °C pendant 7 min et avec un volume reactionnel 
final de 100 /il. 

Amorces utilisees pour 1 • amplification par PCR: 

- amorce 5', identifiee par SEQ ID NO: 139 
5 1 TGT CCG CTG TGC TCC TGA TC 3 1 

- amorce 3», identifiee par SEQ ID NO: 138 
5 1 CTT GGA GGG TGC ATA ACC AGG GAA T 3 1 

Une deuxieme amplification par PCR dite M semi- 
nichee " a ete realisee avec une amorce 3 1 situee a 
l'interieur de la region deja amplifiee. Cette deuxieme 
amplification a ete effectuee sous les memes contitions 
experimentales que celles utilisees lors de la premiere 
amplification, en utilisant 10 /xl du produit 
d 1 amplification issu de la premiere PCR. 

Amorces utilisees por 1 1 amplification par PCR w semi- 
nichee " : 

- amorce 5 1 , identifiee par SEQ ID NO: 139 
5 1 TGT CCG CTG TGC TCC TGA TC 3 ' 

- amorce 3', identifiee par SEQ ID NO: 140 
5 1 CTA TGT CCT TTT GGA CTG TTT GGG T 3 1 

Les amorces SEQ ID NO: 138 et SEQ ID NO: 140 sont 
specifiques de la region gag, clone 2 position 
nucleotidique n° 373-397 et n° 433-456. Les amorces 
utilisees dans la region 5 1 ont ete definies sur des 
sequences des clones obtenus lors d'essais prealables. 

Un produit d 1 amplification de 764 pb a ete obtenu 
et clone de la fagon suivante: 

L • ADN amplifie a ete insere dans un plasmide a 
l'aide du kit TA Cloning™. Les 2 jul de solution d 1 ADN ont 
ete melanges avec 5 /xl d'eau distillee sterile, 1 /xl d'un 



tampon de ligation 10 fois concentre "10X LIGATION 

BUFFER", 2 Ml de "pCR™ VECTOR" (25 ng/ml) et 1 /xl de "T4 

DNA LIGASE" . Ce melange a ete incube une nuit a 14 °C. Les 

etapes suivantes ont ete realisees conformement au 

instructions du kit TA Cloning® (Invitrogen) . Le melange a 

ete etale apres transformation de la ligation dans des 

bacteries E. coli INVaF ' . A la fin de la procedure, les 

colonies blanches de bacteries recombinantes ont ete 

repiquees pour etre cultivees et permettre 1' extraction 

des plasmides incorpores selon la procedure dite de 

"minipreparation d'ADN" (17). La preparation de plasmide 

de chaque colonie recombinante a ete coupee par 1' enzyme 

de restriction Eco RI et analysee sur gel d' agarose. Les 

plasmides possedant un insert detecte sous lumiere UV 

apres marquage du gel au bromure d'ethidium, ont ete 

selectionnes pour le sequenqiage de 1* insert, apres 

hybridation avec une amorce complementaire du promoteur T7 

present sur le plasmide de clonage du TA cloning kit®. La 

reaction prealable au sequengage a ensuite ete effectuee 

selon la methode preconisee pour I 1 utilisation du kit de 

TM 

sequen?age "PRISM Ready Reaction Amplitaq® FS, 

TM 

DyeDeoxy Terminator" (Applied Biosystems, ref. 402119) 
et le sequengage automatique a ete realise sur les 
appareils 373 A et 377 Applied Biosystems, selon les 
instructions du fabricant. 

Le clone LB13 obtenu contient une region N- 
terminale de gene gag MSRV-1 homologue au clone 2 et un 
LTR correspondant a une partie de la region U5 . Entre la 
region U5 et gag un site de fixation pour les ARN de 
transfert, le PBS " primer binding site " a ete identifie. 

La sequence nucleotidique du fragment de 764 pb 
du clone LB13 dans le plasmide "pCR™ vector" est 
representee dans 1 1 ident if icateur SEQ ID NO: 141. 

Le site de fixation pour les ARN de transfert, 
presentant une sequence du type PBS tryptophane, a ete 
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identifie en position nucleotidique n°342-359 du clone 
LB13 . 

Un autre clone, denomme LA15 a ete obtenu sur 
1 • ARN total extrait de virions concentres par 
5 ultracentrifugation a partir d'un surnageant de culture de 
synoviocytes provenant d'un patient atteint de 
polyarthrite rhumatoide. La strategie d ' amplification et 
clonage du clone LA15 est exactement la meme qui a ete 
utilisee pour le clone LB13 . 

10 La sequence nucleotidique du clone LA15 qui est 

representee dans 1 1 identif icateur SEQ ID NO: 142, est tres 
similaire au clone LB13 . Ceci suggere que le retrovirus 
MSVR-1 detecte dans la sclerose en plaque presente des 
sequences similaires a celles rencontrees dans la 

15 polyarthrite rhumatoide. 
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LISTE DE SEQUENCES 



(2) INFORMATIONS POUR LA SEQ ID NO: 68: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 34 paires de bases 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 
(ii) TYPE DE MOLECULE: ADNc 

(xi) DESCRIPTION DE LA SEQUENCE : SEQ ID NO: 
GACTCGCTGC AGATCGATTT TTTTTTTTTT TTTT 



(2) INFORMATIONS POUR LA SEQ ID NO: 69: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 30 paires de bases 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 
(ii) TYPE DE MOLECULE: ADNc 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 
GCCATCAAGC CACCCAAGAA CTCTTAACTT 



(2) INFORMATIONS POUR LA SEQ ID NO: 70: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 30 paires de bases 

( B ) TYPE : nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 
(ii) TYPE DE MOLECULE: ADNc 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 
CCAATAGCCA GACCATTATA TACACTAATT 
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(2) INFORMATIONS POUR LA SEQ ID NO: 112 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 310 paires de bases 

(B) TYPE: nucleotidique 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 112: 

GCTTATAGAA GGACCCCTAG TATGGGGTAA TCCCCTCTGG GAAACCAAGC CCCAG TACTIC 60 

AGCAGGAAAA ATAGAATAGG AAACCTCAcJa AGGACATACT TTCCTCCCCT CCAGATGGCT 120 

C .A T<S r 

AGCCACTGAG ,-GAAGGAAAAA TACTTTCACC TGCAGCTAAC CAACAGAAAT TACTTAAAAC 180 

CCTTCACCAA ACCTTCCACT TAGGCATTGA TAGCACCCAT C AG ATGG CCA AATTATTATT 240 

TACTGGACCA GGCCTTTTCA AAACTATCAA GAAGATAGTC AGGGGCTGTG AAGTGTGCCA 300 

AAGAAATAAT 310 

(2) INFORMATIONS POUR LA SEQ ID NO: 113 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 103 acides amines 

(B) TYPE: peptidique 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 113: 
LIEGPLVWGNPLWETKPQYSAGKIEXETSQGHTFLPSRWLATEEGKILSPAANQQKLLKTLHQTFHLGID 
STHQMAKLLFTGPGLFKT I KK I VRGCE VCQRNN 



(2) INFORMATIONS POUR LA SEQ ID NO: 114 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 635 paires de bases 

(B) TYPE: nucleotidique 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 114: 
CCCTGTATCT TTAACCTCCT TGTTAAGTTT GTCTCTTCCA GAATCAAAAC TGTAAAACTA 60 
CAAATTGTTC TTCAAATGGA GCACCAGATG GAGTCCATGA CTAAGATCCA CCGTGGACCC 120 
CTGGACCGGC CTGCTAGCCC ATGCTCCGAT GTTAATGACA TTGAAGGCAC CCCTCCCGAG 180 
GAAATCTCAA CTGCACAACC CCTACTATGC CCCAATTCAG CGGGAAGCAG TTAGAGCGGT 240 
CATCAGCCAA CCTCCCCAAC AGCACTTGGG TTTTCCTGTT GAGAGGGGGG ACTGAGAGAC 300 



AGGACTAGCT GGATTTCCTA GGCCAACGAA GAATCCCTAA GCCTAGCTGG GAAGGTGACT 360 

GCATCCACCT CTAAACATGG GGCTTGCAAC TTAGCTCACA CCCGACCAAT CAGAGAGCTC 420 

ACTAAAATGC TAATTAGGCA AAAATAGGAG GTAAAGAAAT AGCCAATCAT CTATTGCCTG 480 

AGAGCACAGC GGGAGGGACA AGGATCGGGA TATAAACCCA GGCATTCGAG CCGGCAACGG 540 

CAACCCCCTT TGGGTCCCCT CCCTTTGTAT GGGCGCTCTG TTTTCACTCT ATTTCACTCT 600 

ATTAAATCTT GCAACTGAAA AAAAAAAAAA AAAAA 635 

(2) INFORMATIONS POUR LA SEQ ID NO: 115 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 77 acides amines 

(B) TYPE: peptidique 

(C) NOMBRE DE BRINS : simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 115: 
PCIFNLLVKFVSSRIKTVKLQI VLQMEHQMESMTKIHRGPLDRPASPCSDVNDIEGTPPEEISTAQPLLC 
PNSAGSS 

(2) INFORMATIONS POUR LA SEQ ID NO: 116 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 32 paires de bases 

(B) TYPE: nucleotidique 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 116: 
TGGGGTTCCA TTTGTAAGAC CATCTGTAGC TT 32 

(2) INFORMATIONS POUR LA SEQ ID NO: 117 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 1481 paires de base 

(B) TYPE: nucleotidique 

(C) NOMBRE DE BRINS: simple 
<D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 117: 

ATGGCCCTCC CTTATCATAC TTTTCTCTTT ACTGTTCTCT TACCCCCTTT CGCTCTCACT 60 

GCACCCCCTC CATGCTGCTG TACAACCAGT AGCTCCCCTT ACCAAGAGTT TCTATGAAGA 120 

ACGCGGCTTC CTGGAAATAT TGATGCCCCA TCATATAGGA GTTTATCTAA GGG AAACTCC 180 
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ACCTTCACTG CCCACACCCA TATGCCCCGC AACTGCTATA ACTCTGCCAC TCTTTGCATG 240 
CATGCAAATA CTCATTATTG GACAGGGAAA ATGATTAATC CTAGTTGTCC TGGAGGACTT 300 
GGAGCCACTG TCTGTTGGAC TTACTTCACC CATACCAGTA TGTCTGATGG GGGTGGAATT 360 
CAAGGTCAGG CAAGAGAAAA ACAAGTAAAG GAAGCAATCT CCCAACTGAC CCGGGGACAT 420 
AGCACCCCTA GCCCCTACAA AGGACTAGTT CTCTCAAAAC TACATGAAAC CCTCCGTACC 480 
CATACTCGCC TGGTGAGCCT ATTTAATACC ACCCTCACTC GGCTCCATGA GGTCTCAGCC 540 
CAAAACCCTA CTAACTGTTG GATGTGCCTC CCCCTGCACT TCAGGCCATA CATTTCAATC 600 
CCTGTTCCTG AACAATGGAA CAACTTCAGC ACAGAAATAA ACACCACTTC CGTTTTAGTA 660 
GGACCTCTTG TTTCCAATCT GGAAATAACC CATACCTCAA ACCTCACCTG TGTAAAATTT 720 
AGCAATACTA TAGACACAAC CAGCTCCCAA TGCATCAGGT GGGTAACACC TCCCACACGA 780 
ATAGTCTGCC TACCCTCAGG AATATTTTTT GTCTGTGGTA CCTCAGCCTA TCATTGTTTG 840 
AATGGCTCTT CAGAATCTAT GTGCTTCCTC TCATTCTTAG TGCCCCCTAT GACCATCTAC 900 
ACTGAACAAG ATTTATACAA TCATGTCGTA CCTAAGCCCC ACAACAAAAG AGTACCCATT 960 
CTTCCTTTTG TTATCAGAGC AGGAGTGCTA GGCAGACTAG GTACTGGCAT TGGCAGTATC 1020 
ACAACCTCTA CTCAGTTCTA CTACAAACTA TCTCAAGAAA TAAATGGTGA CATGGAACAG 1080 
GTCACTGACT CCCTGGTCAC CTTGCAAGAT CAACTTAACT CCCTAGCAGC AGTAGTCCTT 1140 
CAAAATCGAA GAGCTTTAGA CTTGCTAACC GCCAAAAGAG GGGGAACCTG TTTATTTTTA 1200 
GGAGAAGAAC GCTGTTATTA TGTTAATCAA TCCAGAATTG TCACTGAGAA AGTTAAAGAA 12 60 
ATTCGAGATC GAATACAATG TAGAGCAGAG GAGCTTCAAA ACACCGAACG CTGGGGCCTC 1320 
CTCAGCCAAT GGATGCCCTG GGTTCTCCCC TTCTTAGGAC CTCTAGCAGC TCTAATATTG 1380 
TTACTCCTCT TTGGACCCTG TATCTTTAAC CTCCTTGTTA AGTTTGTCTC TTCCAGAATT 1440 
GAAGCTGTAA AGCTACAGAT GGTCTTACAA ATGGAACCCC A 1481 

(2) INFORMATIONS POUR LA SEQ ID NO: 118 

(i) CARACTERISTIQUES DE LA SEQUENCE : 

(A) LONGUEUR: 493 acides amines 

(B) TYPE: peptidique 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 118: 
MALPYHTFLFTVLLPPFALTAPPPCCCTTSSSPYQEFLXRTRLPGNIDAPS YRSLSKGNSTFTAHTHMPR 
NCYNSATLCMHANTHYWTGKWINPSCPGGLGATVCWTYFTHTSMSDGGGIQGQAREKQVKEAISQLTRGH 
STPSPYKGLVLSKLHETLRTHTRLVSLFNTTLTRLHEVS AQNPTNCWMCLPLHFRPY I S I PVPEQWNNFS 
TEINTTSVLVGPLVSNLEITHTSNLTCVKFSNTIDTTSSQCIRWVTPPTRI VCLPSGI FFVCGTSAYHCL 
NGSSESMCFLS FLVPPMT I YTEQDLYNH WPKPHNKR VP I LPFVI RAG VLGRLGTG I GS I TTSTQFYYKL 
SQEINGDMEQVTDSLVTLQDQLNSLAAWLQNRRALDLLTAKRGGTCLFLGEERCYYVNQSRI VTEKVKE 
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IRDRIQCRAEELQNTERWGLLSQWMPWVLPFLGPLAALILLLLFGPCIFNLLVKFVSSRIEAVKLQMVLQ 
MEP 

(2) INFORMATIONS POUR LA SEQ ID NO: 119 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 32 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS : simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 119: 
TCAAAATCGA AGAGCTTTAG ACTTGCTAAC CG 32 

(2) INFORMATIONS POUR LA SEQ ID NO: 120 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 1329 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 120: 

TCAAAATCGA AGAGCTTTAG ACTTGCTAAC CGCCAAAAGA GGGGGAACCT GTTTATTTTT 60 

AGGGGAAGAA TGCTGTTAGT ATGTTAATCA ATCTGGAATC ATTACTGAGA AAGTTAAAGA 120 

AATTTGAGAT CGAATATAAT G TAG AG CAG A GGACCTTCAA AACACTGCAC CCTGGGGCCT 180 

CCTCAGCCAA TGGATGCCCT GGACTCTCCC CTTCTTAGGA CCTCTAGCAG CTATAATATT 240 

TTTACTCCTC TTTGGACCCT GTATCTTCAA CTTCCTTGTT AAGTTTGTCT CTTCCAGAAT 300 

TGAAGCTGTA AAGCTACAAA TAGTTCTTCA AATGGAACCC CAGATGCAGT CCATGACTAA 360 

AATCTACCGT GGACCCCTGG ACCGGCCTGC TAGACTATGC TCTGATGTTA ATGACATTGA 420 

AGTCACCCCT CCCGAGG AAA TCTCAACTGC ACAACCCCTA CTACACTCCA ATTCAGTAGG 480 

AAGCAGTTAG AGCAGTTGTC AGCCAACCTC CCCAACAGTA CTTGGGTTTT CCTGTTGAGA 540 

GGGTGGACTG AGAGACAGGA CTAGCTGGAT TTCCTAGGCT GACTAAGAAT CCCNAAGCCT 600 

ANCTGGGAAG GTGACCGCAT CCATCTTTAA ACATGGGGCT TGCAACTTAG CTCACACCCG 660 

ACCAATCAGA GAGCTCACTA AAATGCTAAT CAGGCAAAAA CAGGAGGTAA AGCAATAGCC 720 

AATCATCTAT TGCCTGAGAG CACAGCGGGA AGGACAAGGA TTGGGATATA AACTCAGGCA 780 

TTCAAGCCAG CAACAGCAAC CCCCTTTGGG TCCCCTCCCA TTGTATGGGA GCTCTGTTTT 840 

CACTCTATTT CACTCTATTA AATCATGCAA CTGCACTCTT CTGGTCCGTG TTTTTTATGG 900 

CTCAAGCTGA GCTTTTGTTC GCCATCCACC ACTGCTGTTT GCCACCGTCA CAGACCCGCT 960 

GCTGACTTCC ATCCCTTTGG ATCCAGCAGA GTGTCCACTG TGCTCCTGAT CCAGCGAGGT 1020 



40 



ACCCATTGCC ACTCCCGATC AGGCTAAAGG 

TGGGTTTGTC CTAATAGAAC TGAACACTGG 

CCACGGCTTC TAATAGAGCT ATAACACTCA 

TCTGTGAGGC CAAGAACCCC AGGTCAGAGA 

CCCACTGCCA TTTTGGTAGC GGCCCACCAC 
CCAGTAACA 



CTTGCCATTG TTCCTGCATG GCTAAGTGCC 1080 
TCACTGGGTT CCATGGTTCT CTTCCATGAC 1140 
CCGCATGGCC CAAGATTCCA TTCCTTGGTA 1200 
ANGTGAGGCT TGCCACCATT TGGGAAGTGG 12 60 
CATCTTGGGA GCTGTGGGAG CAAGGATCCC 1320 

1329 



(2) INFORMATIONS POUR LA SEQ ID NO: 121 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 162 acides amines 

(B) TYPE: peptide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 121: 
QNRRALDLLTAKRGGTCLFLGEECCXYVNQSGI ITEKVKEIXDRIXCRAEDLQNTAPWGLLSQWMPWTLP 
FLGPLAAI IFLLLFGPCIFNFLVKFVSSRIEAVKLQIVLQMEPQMQSMTKI YRGPLDRPARLCSDVNDIE 
VTPPEEISTAQPLLHSNSVGSS 



(2) INFORMATIONS POUR LA SEQ ID NO: 122 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 122: 
GGCATTGATA GCACCCATCA G 21 

(2) INFORMATIONS POUR LA SEQ ID NO: 123 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 21 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 123: 
CATGTCACCA GGGTGGAATA G 21 
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(2) INFORMATIONS POUR LA SEQ ID NO: 124 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: paires de base 
<B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 124: 



10 



15 



20 



25 



(2) INFORMATIONS POUR LA SEQ ID NO: 12 6 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 21 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 126: 
CATGTCACCA GGGTGGAATA G 21 



(2) INFORMATIONS POUR LA SEQ ID NO: 127 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 26 paires de base 

(B) TYPE: nucleotide 

30 (C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 
(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 127: 
GGACAGGAAA GTAAGACTGA GAAGGC 2 6 



35 (2) INFORMATIONS POUR LA SEQ ID NO: 128 

(i) CARACTERISTIQUES DE LA SEQUENCE: 
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(A) LONGUEUR: paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS : simple 

(D) CONFIGURATION: lineaire 

5 (xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 128: 

(2) INFORMATIONS POUR LA SEQ ID NO: 129 

(i) CARACTERISTIQUES DE LA SEQUENCE: 
10 (A) LONGUEUR: paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 129: 
15 <^ tK^jm^ia 5 

(2) INFORMATIONS POUR LA SEQ ID NO: 130 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 1511 paires de base 
20 (B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 130: 





CCTAGAACGT 


ATTCTGGAGA 


ATTGGGACCA 


ATGTGACACT 


CAGACGCTAA 


GAAAGAAACG 


60 


25 


ATTTATATTC 


TTCTGCAGTA 


CCGCCTGGCC 


ACAATATCCT 


CTTCAAGGGA 


GAGAAACCTG 


120 




GCTTCCTGAG 


GGAAGTATAA 


ATTATAACAT 


CATCTTACAG 


CTAGACCTCT 


TCTGTAGAAA 


180 




GGAGGGCAAA 


TGGAGTGAAG 


TGCCATATGT 


GCAAACTTTC 


TTTTCATTAA 


GAGACAACTC 


240 




ACAATTATGT 


AAAAAGTGTG 


GTTTATGCCC 


TACAGGAAGC 


CCTCAGAGTC 


CACCTCCCTA 


300 




CCCCAGCGTC 


CCCTCCCCGA 


CTCCTTCCTC 


AACTAATAAG 


GACCCCCCTT 


TAACCCAAAC 


360 


30 


GGTCCAAAAG 


GAGATAGACA 


AAGGGGTAAA 


CAATGAACCA 


AAGAGTGCCA 


ATATTCCCCG 


420 




ATTATGCCCC 


CTCCAAGCAG 


TGAGAGGAGG 


AGAATTCGGC 


CCAGCCAGAG 


TGCCTGTACC 


480 




TTTTTCTCTC 


TCAGACTTAA 


AGCAAATTAA 


AATAGACCTA 


GGTAAATTCT 


CAGATAACCC 


540 




TGACGGCTAT 


ATTGATGTTT 


TACAAGGGTT 


AGGACAATCC 


TTTGATCTGA 


CATGGAGAGA 


600 




TATAATGTTA 


CTACTAAATC 


AGACACTAAC 


CCCAAATGAG 


AGAAGTGCCG 


CTGTAACTGC 


660 


35 


AGCCCGAGAG 


TTTGGCGATC 


TTTGGTATCT 


CAGTCAGGCC 


AACAATAGGA 


TGACAACAGA 


720 




GGAAAGAACA 


ACTCCCACAG 


GCCAGCAGGC 


AGTTCCCAGT 


GTAGACCCTC 


ATTGGGACAC 


780 
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AGAATCAGAA CATGGAGATT GGTGCCACAA 
GAGGAAAACT AGGAAGAAGC CTATGAATTA 
GGAAGAAAAT CTTACTGCTT TTCTGGACAG 
CCTGTCACCT GACTCTATTG AAGGCCAACT 
AGCTGCAGAC ATTAGAAAAA ACTTCAAAAG 
ACCCTATTTA ACTTGGCATC CTCAGTTTTT 
CGGGACAAAC GGGATAAAAA AAAAAGGGGG 
AAGCAGACTT TGGAGGCTCT G C AAAAGGG A 
CTGGCTTCCA GTGCGGTCTA CAAGGACACT 
CGCCCCCTTG TCCATGCCCC TTACGTCAAG 
GATGAAGATA CTCTGAGTCA GAAGCCATTA 
GCCCGGGGCG AGCGCCAGCC CATGCCATCA 
TTGAGAGCCA A 



ACATTTGCTA ACTTGCGTGC TAGAAGGACT 840 
CTCAATGATG TCCACTATAA CACAGGGAAA 900 
ACTAAGGGAG GCATTGAGGA AGCATACCTC 960 
AATCTTAAAG GATAAGTTTA TCACTCAGTC 1020 
TCTGCCTTAG GCCCGGAGCA GAACTTAGAA 1080 
TATAATAGAG ATCAGGAGGA GCAGGCGAAA 1140 
GGTCCACTAC TTTAGTCATG GCCCTCAGGC 1200 
AAAGCTGGGC AAATCAAATG CCTAATAGGG 1260 
TTAAAAAAGA TTATCCAAGT AGAAATAAGC 1320 
GGAATCACTG GAAGGCCCAC TGCCCCAGGG 1380 
ACCAGATGAT C C AG C AG C AG GACTGAGGGT 1440 
CCCTCACAGA GCCCCGGGTA TGTTTGACCA 1500 

1511 



(2) INFORMATIONS POUR LA SEQ ID NO: 131 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 347 acides amines 

(B) TYPE: peptide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 131: 
LERILENWDQCDTQTLRKKRFIFFCSTAWPQYPLQGRETWLPEGSINYNI ILQLDLFCRKEGKWSEVPYV 
QTFFSLRDNSQLCKKCGLCPTGSPQSPPPYPSVPSPTPSSTNKDPPLTQTVQKEIDKGVNNEPKSANIPR 
LCPLQAVRGGEFGPARVPVPFSLSDLKQIKIDLGKFSDNPDGYIDVLQGLGQSFDLTWRDIMLLLNQTLT 
PNERSAAVTAAREFGDLWYLSQANNRMTTEERTTPTGQQAVPSVDPHWDTESEHGDWCHKHLLTCVLEGL 
RKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRKHTSLSPDSIEGQLILKDKFITQSAADIRKNFKS 
LP 



(2) INFORMATIONS POUR LA SEQ ID NO: 132 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 30 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 132: 
TGCTGGAATT CGGGATCCTA GAACGTATTC 30 



(2) INFORMATIONS POUR LA SEQ ID NO: 133 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 30 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS : simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 133: 
AGTTCTGCTC CGAAGCTTAG GC AG ACTTTT 30 



(2) INFORMATIONS POUR LA SEQ ID NO: 135 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: acides amines 

(B) TYPE: peptide 

(C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 135: 
MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRILERILENWDQCDTQTLRKKRFIFFCSTAWPQYPLQGR 
ETWLPEGSINYNIILQLDLFCRKEGKWSEVPYVQTFFSLRDNSQLCKKCGLCPTGSPQSPPPYPSVPSPT 
PSSTNKDPPLTQTVQKEIDKGVNNEPKSANIPRLCPLQAVRGGEFGPARVPVPFSLSDLKQIKIDLGKFS 
DNPDGYIDVLQGLGQSFDLTWRDIMLLLNQTLTPNERSAAVTAAREFGDLWYLSQANNRMTTEERTTPTG 
QQAVPSVDPHWDTESEHGDWCHKHLLTCVLEGLRKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRK 
HTSLSPDSIEGQLILKDKFITQSAADIRKNFKSLPKLAAALEHHHHHH 



10 (2) INFORMATIONS POUR LA SEQ ID NO: 137 

(i) CARACTERISTIQUES DE LA SEQUENCE: 



(A) LONGUEUR: 



acides amines 



(B) TYPE: peptide 

(C) NOMBRE DE BRINS: simple 



15 



(D) CONFIGURATION: lineaire 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 137: 



MASMTGGQQMGRILERILENWDQCDTQTLRKKRFIFFCSTAWPQYPLQGRETWLPEGSINYNIILQLDLF 
CRKEGKWSEVPYVQTFFSLRDNSQLCKKCGLCPTGSPQSPPPYPSVPSPTPSSTNKDPPLTQTVQKEIDK 
GVNNEPKSANIPRLCPLQAVRGGEFGPARVPVPFSLSDLKQIKIDLGKFSDNPDGYIDVLQGLGQSFDLT 
2 0 WRDIMLLLNQTLTPNERSAAVTAAREFGDLWYLSQANNRMTTEERTTPTGQQAVPSVDPHWDTESEHGDW 
CHKHLLTCVLEGLRKTRKKPMNYSMMSTITQGKEENLTAFLDRLREALRKHTSLSPDSIEGQLILKDKFI 
TQSAADIRKNFKSLPKLAAALEHHHHHH 

(2) INFORMATIONS POUR LA SEQ ID NO: 138 
2 5 (i) CARACTERISTIQUES DE LA SEQUENCE: 



(A) LONGUEUR: 25 paires de base 



(B) TYPE: nucleotide 



(C) NOMBRE DE BRINS: simple 



(D) CONFIGURATION: lineaire 



30 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 138: 



CTTGGAGGGT GCATAACCAG GGAAT 



25 



(2) INFORMATIONS POUR LA SEQ ID NO: 139 

(i) CARACTERISTIQUES DE LA SEQUENCE: 



35 



(A) LONGUEUR: 20 paires de base 



(B) TYPE: nucleotide 



44 



(C) NOMBRE DE BRINS : simple 

(D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 139: 
TGTCCGCTGT GCTCCTGATC 20 

5 

(2) INFORMATIONS POUR LA SEQ ID NO: 140 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 25 paires de base 

( B ) TYPE : nucleot ide 

10 (C) NOMBRE DE BRINS: simple 

(D) CONFIGURATION: lineaire 
(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 140: 
CTATGTCCTT TTGGACTGTT TGGGT 25 

15 (2) INFORMATIONS POUR LA SEQ ID NO: 141 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 764 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 
2 0 (D) CONFIGURATION: lineaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 141: 

TGTCCGCTGT GCTCCTGATC CAGCACAGGC GCCCATTGCC TCTCCCAATT GGGCTAAAGG 60 

CTTGCCATTG TTCCTGCACA GCTAAGTGCC TGGGTTCATC CTAATCGAGC TGAACACTAG 120 

TCACTGGGTT CCACGGTTCT CTTCCATGAC CCATGGCTTC TAATAGAGCT ATAACACTCA 180 

2 5 CTGCATGGTC CAAGATTCCA TTCCTTGGAA TCCGTGAGAC CAAGAACCCC AGGTCAGAGA 240 

ACACAAGGCT TGCCACCATG TTGGAAGCAG CCCACCACCA TTTTGGAAGC AGCCCGCCAC 300 

TATCTTGGGA GCTCTGGGAG CAAGGACCCC AGGTAACAAT TTGGTGACCA CGAAGGGACC 360 

TGAATCCGCA ACCATGAAGG GATCTCCAAA GCAATTGGAA ATGTTCCTCC CAAGGCAAAA 420 

ATGCCCCTAA GATGTATTCT GGAGAATTGG GACCAATTTG ACCCTCAGAC AGTAAGAAAA 480 

3 0 AAATGACTTA TATTCTTCTG CAGTACCGCC CTGGCCACGA TATCCTCTTC AAGGGGGAGA 540 

AACCTGGCCT CCTGAGGGAA GTATAAATTA TAACACCATC TTACAGCTAG ACCTGTTTTG 600 

TAGAAAAGGA GGCAAATGGA GTGAAGTGCC ATATTTACAA ACTTTCTTTT CATTAAAAGA 660 

CAACTCGCAA TTATGTTAAC AGTGTGATTT GTGTTCCTAC ACGGAAGCCC TCAGATTCTA 720 

CTCCCCACCC CCGGCATCTC CCCTGAATCC CTCCCCAACT TATT 764 
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(2) INFORMATIONS POUR LA SEQ ID NO: 142 



(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 800 paires de base 

(B) TYPE: nucleotide 

(C) NOMBRE DE BRINS: simple 
5 (D) CONFIGURATION: liniaire 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 142: 





TGTCCGCTGT 


GCTCCTGATC 


CAGCACAGGC 


GCCCATTGCC 


TCTCCCAATT 


GGGCTAAAGG 


60 




CTTGCCATTG 


TTCCTGCACA 


GCTAAGTGCC 


TGGGTTCATC 




X vrvn^rH*- J. r\\j 


i on 




TCACTGGGTT 


CCACGGTTCT 


CTTCCATGAC 


CCATGGCTTC 


TAATAGAGCT 


ATAACACTCA 


180 


10 


CTGCATGGTC 


CAAGATTCCA 


TTCCTTGGAA 


TCCGTGAGAC 


CAAGAACCCC 


AGGTCAGAGA 


240 




ACACAAGGCT 


TGCCACCATG 


TTGGAAGCAG 


CCCACCACCA 


TTTTGGAAGC 


GGCCCGCCAC 


300 




TATCTTGGGA 


GCTCTGGGAG 


CAAGGACCCC 


CAGGTAACAA 


TTTGGTGACC 


ACGAAGGGAC 


360 




CTGAATCCGC 


AACCATGAAG 


GGATCTCCAA 


AGCAATTGGA 


AATGTTCCTC 


CCAAGGCAAA 


420 




AATGCCCCTA 


AGATGTATTC 


TGGAGAATTG 


GGACCAATCT 


GACCCTCAGA 


CAGTAAGAAA 


480 


15 


AAAAATGACT 


TATATTCTTC 


TGCAGTACCG 


CCTGGCCACG 


GATATCCTCT 


TCAAGGGGGA 


540 




GAAACCTGGC 


CTCCTGAGGG 


AAGTATAAAT 


TATAACACCA 


TCTTACAGCT 


AGACCTGTTT 


600 




TGTAGAAAAG 


GAGGCAAATG 


GAGTGAAGTG 


CCATATTTAC 


AAACTTTCTT 


TTCATTAAAA 


660 




GACAACTCGC 


AATTATGTAA 


ACAGTGTGAT 


TTGTGTCCTA 


CAGGAAGCCC 


TCAGATCTAC 


720 




CTCCCTACCC 


CGGCATCTCC 


CTGACTCCTT 


CCCCAACTAA 


TAAGGACCCA 


CTTCAGCCCA 


780 


20 


AACAGTCCAA 


AAGGACATAG 










800 
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RE VEND I CAT I ONS 

1. Materiel nucleique, a 1 1 etat isole ou purifie, 
comprenant une sequence nucleotidique choisie dans le 
groupe qui consiste en (i) les sequences SEQ ID NO: 112, 
SEQ ID NO: 114, SEQ ID NO: 117, SEQ ID NO: 120, 
SEQ ID NO: 124, SEQ ID NO: 130, SEQ ID NO: 141 et 
SEQ ID NO: 142 ; (ii) les sequences complementaires des 
sequences (i) ; et (iii) les sequences equivalentes aux 
sequences (i) ou (ii), en particulier les sequences 
presentant pour toute suite de 100 monomeres contigus, au 
moins 50 %, et pref erentiellement au moins 70 % 
d^omologie avec respectivement les sequences (i) ou (ii) . 

2. Materiel nucleique, a 1 1 etat isole ou purifie, 
codant pour un polypeptide presentant, pour toute suite 
contigue d'au moins 30 acides amines, au moins 50 %, et de 
preference au moins 7 0 % d'homologie, avec une sequence 
peptidique choisie dans le groupe qui consiste en 
SEQ ID NO: 113, SEQ ID N°115, SEQ ID NO: 118, 
SEQ ID NO: 121, SEQ ID NO: 135 et SEQ ID NO: 137. 

3. Materiel nucleique retroviral, dont le gene pol 
comprend une sequence nucleotidique identique ou 
equivalente a une sequence choisie dans le groupe qui 
consiste en SEQ ID NO: 112, SEQ ID NO: 124 et leurs 
sequences complementaires. 

4. Materiel nucleique retroviral, dont 1 1 extremite 
5* du gene pol commence au nucleotide 1419 de 
SEQ ID NO: 130. 

5. Materiel nucleique retroviral, dont le gene pol 
code pour un polypeptide presentant, pour toute suite 
contigue d'au moins 30 acides amines, au moins 50 %, et de 
preference au moins 70 % d'homologie, avec la sequence 
peptidique SEQ ID NO: 113. 

6. Materiel nucleique retroviral, dont 1 ' extremite 
3* du gene gag finit au nucleotide 1418 de SEQ ID NO: 130. 



7. Materiel nucleique retroviral, dont le gene env 
comprend une sequence nucleotidique identique ou 
equivalente a une sequence choisie dans le groupe qui 
consiste en SEQ ID NO: 117, et ses sequences 
complementaires . 

8. Materiel nucleique retroviral, dont le gene env 
comprend une sequence nucleotidique qui commence au 
nucleotide 1 de SEQ ID NO: 117 et finit au nucleotide au 
nucleotide 233 de SEQ ID NO: 114. 

9. Materiel nucleique retroviral, dont le gene env 
code pour un polypeptide presentant, pour toute suite 
contigue d'au moins 30 acides amines, au moins 50 %, et de 
preference au moins 70 % d'homologie, avec la sequence 
SEQ ID NO: °118 . 

10. Materiel nucleique retroviral dont la region 
U3R du LTR 3 1 comprend une sequence nucleotidique qui se 
termine au nucleotide 617 de SEQ ID NO: 114. 

11. Materiel nucleique retroviral dont la region 
RU5 du LTR 5' comprend une sequence nucleotidique qui 
commence au nucleotide 755 de SEQ ID NO: 120 et finit au 
nucleotide 337 de SEQ ID NO: 141 ou SEQ ID NO: 142. 

12. Materiel nucleique retroviral comprenant une 
sequence qui commence au nucleotide 755 de SEQ ID NO: 120 
et qui se termine au nucleotide 617 de SEQ ID NO: 114. 

13. Materiel nucleique retroviral selon l'une 
quelconque des revendications precedentes, caracterise en 
ce qu'il est associe a au moins une maladie auto-immune 
telle que la sclerose en plaques ou la polyarthrite 
rhumatoide. 

14. Fragment nucleotidique comprenant une sequence 
nucleotidique choisie dans le groupe qui consiste en (i) 
les sequences SEQ ID NO: 112, SEQ ID NO: 114, 
SEQ ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, 
SEQ ID NO: 130, SEQ ID NO: 141 et SEQ ID NO: 142 ; (ii) 
les sequences complementaires des sequences (i) ; et (iii) 
les sequences equivalentes aux sequences (i) ou (ii), en 
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particulier les sequences presentant pour toute suite de 
100 monomeres contigus, au moins 50 %, et 
pref er ent ie 1 lement au moins 70 % d 1 homologie avec 
respectivement les sequences (i) ou (ii) . 
5 15. Fragment nucleotidique selon la revendication 

14, consistant en une sequence nucleotidique choisie dans 
le groupe qui consiste en (i) les sequences 
SEQ ID NO: 112, SEQ ID NO: 114, SEQ ID NO: 117, 

SEQ ID NO: 120, SEQ ID NO: 124, SEQ ID NO: 130, 

10 SEQ ID NO: 141 et SEQ ID NO: 142 ; (ii) les sequences 
complementaires des sequences (i) ; et (iii) les sequences 
equivalentes aux sequences (i) ou (ii) , en particulier les 
sequences presentant pour toute suite de 100 monomeres 
contigus, au moins 50 %, et pref erentiellement au moins 

15 70 % d'homologie avec respectivement les sequences (i) ou 
(ii) . 

16. Fragment nucleotidique comprenant une sequence 
nucleotidique codant pour un polypeptide presentant, pour 
toute suite contigue d'au moins 30 acides amines, au moins 

20 50 %, et de preference au moins 70 % d' homologie, avec une 
sequence peptidique choisie dans le groupe qui consiste en 
SEQ ID NO: 113, SEQ ID N°115, SEQ ID NO: 118, 

SEQ ID NO: 121, SEQ ID NO: 135 et SEQ ID NO: 137. 

17. Fragment nucleotidique selon la revendication 
25 16, consistant en une sequence nucleotidique codant pour 

un polypeptide presentant, pour toute suite contigue d'au 
moins 30 acides amines, au moins 50 %, et de preference au 
moins 70 % d' homologie, avec une sequence peptidique 
choisie dans le groupe qui consiste en SEQ ID NO: 113, 
30 SEQ ID N°115, SEQ ID NO: 118, SEQ ID NO: 121, 

SEQ ID NO: 135 et SEQ ID NO: 137. 

18. Sonde nucleique pour la detection d'un 
retrovirus associe a la sclerose en plaques et/ou la 
polyarthrite rhumatoide, caracterisee en ce qu'elle est 

35 susceptible de s'hybrider specif iquement sur tout fragment 



selon l'une quelconque des revendications 14 a 17, 
appartenant au genome dudit retrovirus. 

19. Sonde selon la revendication 18, caracterisee 
en ce qu'elle possede de 10 a 100 nucleotides, de 
preference de 10 a 30 nucleotides. 

20. Amorce pour 1 • amplification par polymerisation 
d'un ARN ou d'un ADN d'un retrovirus associe a la sclerose 
en plaques et/ou la polyarthrite rhumatoide, caracterisee 
en ce qu'elle comprend une sequence nucleotidique 
identique ou equivalente a au moins une partie de la 
sequence nucleotidique d'un fragment selon l'une 
quelconque des revendications 8 a 11, notamment une 
sequence nucleotidique presentant pour toute suite de 10 
monomeres contigus, au moins 50 %, de preference au moins 
70 % d'homologie avec au moins ladite partie dudit 
fragment. 

21. Amorce selon la revendication 20, caracterisee 
en ce que sa sequence nucleotidique est choisie parmi 
SEQ ID NO: 116, SEQ ID NO: 119, SEQ ID NO: 122, 
SEQ ID NO: 123, SEQ ID NO: 126, SEQ ID NO: 127, 
SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 132, et 
SEQ ID NO: 133. 

22. ARN ou ADN, et notamment vecteur de 
replication et/ou d ' expression , comprenant un fragment 
genomique du materiel nucleique selon l'une quelconque des 
revendications 1 a 7 ou un fragment selon l'une quelconque 
des revendications 14 a 17. 

23. Peptide code par tout cadre de lecture ouvert 
appartenant a un fragment nucleotidique selon l'une 
quelconque des revendications 14 a 17, notamment 
polypeptide, par exemple oligopeptide formant ou 
comprenant un determinant antigenique reconnu par des sera 
de patients infectes par le virus MSRV-1, et/ou chez 
lesquels le virus MSRV-l a ete reactive. 

24. Peptide selon la revendication 23 comprenant 
une sequence identique, partiellement ou totalement, ou 
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equivalente a une sequence choisie parmi SEQ ID NO: 113, 
SEQ ID N°115, SEQ ID NO: 118, SEQ ID NO: 121, 

SEQ ID NO: 135 et SEQ ID NO: 137. 

25, Composition diagnostique , prophylactique, ou 
5 therapeutique, notamrrient pour inhiber 1' expression d'au 
moins un retrovirus associe a la sclerose en plaques et/ou 
a la polyarthrite rhumatoide, comprenant un fragment 
nucleotidique selon l'une quelconque des revendications 14 
a 17* 

10 26. Procede pour detecter un retrovirus associe a 

la sclerose en plaques et/ou a la polyarthrite rhumatoide, 
dans un echantillon biologique, caracterise en ce que l'on 
met en contact un , ARN et/ou un ADN presume appartenir ou 
provenant dudit retrovirus, ou leur ARN et/ou ADN 

15 complementaire, avec une composition comprenant un 
fragment nucleotidique selon l'une quelconque des 
revendications 14 a 17. 



ABRE6E 

MATERIEL NUCLEI QUE RETROVIRAL ET FRAGMENTS NUCLEOTIDIQUES 
NOTAMMENT ASSOCIES A LA SCLEROSE EN PLAQUES ET/OU LA 
POLYARTHRITE RHUMATOIDE / A DES PINS DE DIAGNOSTIC, 
PROPHYLACTIQUES ET THERAPEUTIQUES 

Materiel nucleique, a l'etat isole ou purifie, et 
fragment nucleotidique, comprenant une sequence 
nucleotidique choisie dans le groupe qui consiste en (i) 
les sequences SEQ ID NO: 112, SEQ ID NO: 114, 

SEQ ID NO: 117, SEQ ID NO: 120, SEQ ID NO: 124, 

SEQ ID NO: 130, SEQ ID NO: 141 et SEQ ID NO: 142 ; (ii) 
les sequences complementaires des sequences (i) ; et (iii) 
les sequences equivalentes aux sequences (i) ou (ii) , en 
particulier les sequences presentant pour toute suite de 
100 monomeres contigus, au moins 50 %, et 
pr6f erentiellement au moins 70 % d'homologie avec 
respectivement les sequences (i) ou (ii) , et utilisations 
pour dStecter un retrovirus associe a la sclerose en 
plaques et/ou a la polyarthrite rhumatoide. 



PIG,. 2 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

QCTTATAGAA G3AOO0CTAG TATOQGGTAA TOOOCICIGG GAAACCAAGC 50 
AYRR TPS MG. SPLG NQA 
LIE GPLV WGN PLW ETKP 
L.K DP. YGVI PSG KPS 

OQCAGTACTC AQCAGSAAAA ATAGAATAGG AAAQCTCACA AGGACATACT 100 
PVL SRKN RIG NLT RTYF 
QYS AGK IE.E TSQ GHT 
PSTQ QEK .NR KPHK D I L 

TIOCIOQQCT GCAGATGGCT AGOCACIGAG GAAG3AAAAA TACTTICAGC 150 

PPL QMA SH.G RKN TFT 
FLPS RWL ATE EGKI LSP 
SSP PDG. PLR KEK YFHL 

TGCAQCTAAC CAACAGAAAT TACTTAAAAC QCITCAGCAA AOCTIGCACT 200 
CS.P TEI T.N PSPN LPL 
A A N QQKL LKT LHQ TFHL 
QLT NRN YLKP FTK PST 

TAGQCATTGA TAGCAQGCAT CAGATGGOCA AATTA3TATT TACTGGACCA 250 
RH. HPS DGQ III YWTR 

GID STH QMAK LLF TGP 
. A L I API RWP NYYL LDQ 

GGQCTTTTCA AAACTATCAA GAAGATAGTC AGGGGCIGTG AAGTGTGOCA 300 

PFQ NYQ EDSQ GL. SVP 
GLFK TIK KIV RGCE VCQ 
AFS KLSR R.S GAV KCAK 

AAGAAATAAT 310 
K K . 
R N N 
E I 



FIG 2 (truuitC) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

CCCTGTATCT TTAACCICCT TCTIAAGTTr GICTXTICCA GAATCAAAAC 50 
PCIF NLL VKF VSSR IKT 
PVS L T S L L S L SLP ESKL 
LYL .PP C.VC L F Q NQN 

TGTAAAACEA. CAAATTGTIC TTCAAATOSA QCAQCAGATG GAGTOCATCA 100 
VKL QIVL QME HQM ESMT 
. NY KLF FKWS T RW SP. 
CKTT NCS SNG APDG VHD 

CTAAGAIOCA CCX3IQGAQ0C CTGGACCGQC CTOCTAGCCC ATOCIUCGAT 150 

KIH RGP LDRP ASP CSD 
LRST VDP WTG LLAH APM 
. DP PWTP GPA C.P MLRC 

GTTAATGACA TTGAAGQCAC COCIQCCGAG GAAATCICAA CTGCACAACC 200 
VNDI EGT PPE EIST AQP 
LMT LKAP LPR KSQ LHNP 
. H .RH PSRG NLN CTT 

OCTACTATQC CCCAATICAG CQQGAAGCAG TTAGAGOQGT CATCAGCCAA 250 
LLC PNSA GSS .SG HQPT 
YYA PIQ REAV RAV ISQ 
PTMP QFS GKQ LERS SAN 

OCTCCCCAAC AGCACTT30G TITICCTGTT GAGAGG3GGG ACTGAGAGAC 300 

SPT ALG FSC. EGG LRD 
PPQQ HLG FPV ERGD .ET 
LPN STWV FLL RGG TERQ 

AGGACTAGCT GGATTTCCTA GQCCAAOSAA GAATCCCTAA GCCTAGCIGG 350 
RTSW IS. ANE ESLS LAG 
GLA GFPR PTK NP. A.LG 
D.L DFL GQRR IPK PSW 



20 30 40 50 

1234567890 1234567890 1734567890 l^ 456789n i9^^ OQ n 
GAAQGIGACT QCATO2AOCT CTAAACATGG GQCriQCAAC TTAGCTCACA 
KVT ASTS KHG AC 
R . L H P p L 
E G D C IHL 



N L A H T 
NMG LAT .LT 
TW GLQL SSH 



OOC53AOCAAT CAGAGAQCTC ACTAAAATGC TAATTAGGCA AAAATAGGAG 

RPI REL TKML IRQ K E 
P DQS ESS LKC L G K NRR 

PTN QRAH . N A N.A KIGG 



400 



450 



GTAAAGAAAT AGCCAATCAT CTATIGCCTG AGAQCACAGC GGGAGGGACA 
VK K. PIi YCL RAQR E G Q 
• R N Sqss IA _ E H S GRDK 
KEI ANH LLPE S T A GGT 



500 



AGGATCGQGA TATAAACCCA QGCATICGAG CCGQCAAOQG CAAGCOCCTT 
G S G YKPR HSS RQR Q P P L 
DRD INP GIRA GNG NPL 
RIGI - TQ AFE PATA TPF 



550 



TOQGTOOCCT GCCTTTGTAT QGGCGCIUIG TITICACICT ATITCACIUr 

GPL PLY GRSV FTL FHS 
WVPS LCM GAL FSLY FTL 
GSP ppvw ALC F HS ISLY 



600 



ATTAAATCTT GCAACTGAAA AAAAAAAAAA AAAAA 
IKSC N.K KKK 
LNL ATEK KKK 
- 1 L . Q L K KKKK 



K 



K 



K 



635 
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FIC, H 



10 20 30 40 50 

1234567890 1234567R90 1234567890 1234567890 1234567890 

ATGGCCCTCC CTTATCATAC TITIUICTIT ACIlj l 'lL' l CT TACCCCCTTT 50 
MALP YHT FLF TVLL PPF 
WPS LIIL F S L LFS YPLS 
GPP LSY FSLY C S L TPF 

03CTCTCACT GCAOOOCX7IC CATOCIGCIG TACAACCAGT AGCTCCCCTT 100 
ALT APPP CCC TTS SSPY 
L S L HPL H A A V QPV APL 
RSHC TPS MLL YNQ. LPL 

ACGAAGAGTT TCTATGAAGA A03033CTIC CTQGAAATAT TGATOCCCCA 150 

QEF L.R TRLP GNI DAP 
TKSF YEE RGF LEIL MPH 
PRV SMKN AAS WKY .CPI 

TCAEATAQGA Gl'l'lATCTAA QQGAAACTOC AOCITCACTG QGCACAGGGA 200 
SYRS LSK GNS TFTA HT H 
HIG VYLR ETP PSL PTPI 
I-E FI. GKLH LHC PHP 

TATGOOCXDGC AACTGCTATA ACTCTGGCAC TCTITGCATG CATOCAAATA 250 
MPR NCYN SAT LCM HANT 
CPA T A I TLPL FAC MQI 
YAPQLL. LCHSLHACKY 

CTCATTATTG GACAQQGAAA ATGATTAATC CTAGTTGTCC TGGAGGACTT 300 

HYW TGK MINP SCP GGL 
LIIG QGK .LI LVVL EDL 
SLL DREN D.S .LS WRTW 

GGAGCCACTG TCTGTTGGAC TTACTTCAGC CATACCAGTA TGTCTGATGG 350 
GATV CWT YFT HTSM SDG 
EPL SVGL TSP IPV CLMG 
SHC LLD LLHP YQY V.W 



10 20 30 40 50 

LTC VKF SNTI DTT SSQ 
T S P V N L AIL • T Q P APN 

P H L C K I • Q Y Y R H N Q L P M 

TGCATCAQGT Q33IAACAOC TCGCACACGA ATAGTCIGCC TAOOCTCAGG 
C I R W V T P P T R I V C L v P p S 0 ° E 
ASG G.HL PHE . S A YPQ E 
H Q V GNT SHTN SLP T L R 

pj^STTTYT GICIGTGGTA TCATIGTTIG AATG<X7rCnT 

IFF VCGT SAY HCL N G S S 

YFL SVV P Q P I ' I V . M A L 

N I F C LWY L S L S L F E W L F 

CAGAATCIAT GIQCTICCIC TCATTCTTAG TOQOOOCIM? GACCATCTAC 

ES M C F L SFLV PPM _ _ 

QNLC ASS H S . CPL ' P H S T T H 
R I Y V L P L I L S APY D H L H 

ACXG^ACAAG ATTTAIACAA TCATGTCGTA OCTAAGCOOC ACAACAAAAG 
TEQD LYN HVV * K P H * K * 
L N K I Y T I MSY L S P T T K E 
. T R F I Q SCRT . A P Q Q * 

AGTAGCCATT CTICCTTTIG TIATCAGAQC AGGAGTGCIA GGCAGACTAG 
VPI LPFV IRA G V L 

YPF F L L LSEQ EC. ' 
STHS SFC YQS R S A R Q T R 

CTACIGGCAT TGGCAGTEATC ACAACCICTA CICAGTICIA OT^AACEA 

TGI G S I T T S T Q F Y Y K L 
V L A L A V S Q P L L S S T T N Y 
YWH WQYH NLY SVL L Q T I 



750 



800 



850 



900 



950 



1000 



1050 



Y2> 



Fid <t i^U) 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

TCTCAAGAAA TAAATOGIGA CATGGAACAG GTCACIGACT OQCTOGTCAC 1100 
SQEI NGD MEQ VTDS LVT 
LKK M V T WNR S L T PWSP 

SRN K W . HGTG H.L P G H ' 

CTIGCAAGAT CAACTIT^ACT CCCEAGCAGC AGTAGIOCIT CAAAATOGAA 1150 
LQD QLNS L A A VVL QNRR 
CKI NLT P.QQ .SF KIE 
LARS T.L PSS SSPS KSK 

GAQCTTEAGA CITOCTAAOC GCCAAAAGAG GGGGAADCIG TITATTTTEA. 1200 

ALD LLT AKRG GTC L F L 
EL.T C.P PKE GEPV YF. 
SFR LANR QKR GNL FIFR 

GGAGAAGAAC QL'lUl'UATm TCITAATCAA TCCAGAATIG TCACIGAGAA 1250 
GEER CYY VNQ SRIV TEK 
EKN AVIM LIN PEL SLRK 
RRT LLL C.SI Q N C H.E 

AGTEAAAGAA ATTOGAGATC GAATACAA1G TAGAGCAGAG GAGCTICAAA 1300 
VKE IRDR IQC RAE E LQN 
LKK FEI E Y N V EQR SFK 
S.RN SRS NTM S R G ASK 

ACACDGAAGG CT3QQQCCIC CTCAGQCAAT GGATOCXXTG G3TTCTCCCC 1350 

TER WGL LSQW MPW VLP 
TPNA GAS SAN GCPG FSP 
HRT LGPP QPM DAL GSPL 

TICTEAGGAC CTCEAGCAGC TCTAATATTG TIACIOCICr TIGGAGCCIG 1400 
FLGP LAA LIL LLLF GPC 
S.D L.QL . Y C YSS LDPV 
LRT SSS SNIV TPL WTL. 



8[Z* 



10 20 30 40 50 

1234567-890 1234567890 1234567890 1234567890 1234567890 

TATCTITAAC CICLT1UJ.TA. AGITIGICIC TICCAGAATT GAAGCTGTAA 1450 
IFN LLVK FVS SRI EAVK 
S L T S L L SLSL PEL K L 
YL.P PC. VCL FQN. SCK 

AGCTACAGAT Q3TCTTACAA ATOGAACCCC A 1481 

LQM VL.Q MEP 
SYRW SYK WNP 
ATD GLTN GTP 



9/2.* 
FIG,.S 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

TCAAAATO3A. AGAGCnTAG ACTIGCIAAC 03CXZAAAAGA. GQ33GAAQCT 50 
SKSK SFR LAN RQKR GNL 
QNR RALD LLT AKR GGTC 
KIE EL. TC.P PKE GEP 

UlTlALLTlTi' AGGQGAAGAA T3CIGTEAGT ATCITAATCA ATCIGGAATC 100 
FIF RGRM LLV C.S IWNH 
LFL GEE CC.Y VNQ SGI 
VYF. GKN AVS MLIN LES 

ATEACIGAGA AAGTIAAAGA AATTIGAGAT CGAAT&EAAT GTAGAGCAGA 150 

Y.E S.R NLRS NIM .SR 
ITEKVKE I . D RI.C RAE 
LLR KLKK FEI EYN VEQR 

GGACCTICAA AACACTGGAC CCIGOQGOCT GCICAGQCAA 1GGATGOOCT 200 
GPSK HCT LGP PQ PM DAL 
DLQ NTAP WGL LSQ WMPW 
TFK TLH PGAS SAN GCP 

QGACTCIOaC CTICTEAQGA GCICEAQCAG CEATAATATT TTTACIQCTC 250 
DSP LLRT SSS YNI FTPL 
TLP FLG PLAA I I F LLL 
GLSP S.D L.Q L.YF YSS 

TnGGAGOCT GTATCTTCAA CTIOL'l'lUl'l' AAGTITGTCT CTIOCAGAAT 300 

WTL YLQ LPC. VCL FQN 
FGPC IFN FLV KFVS SRI 
LDP VSST SLL SLS LPEL 

TGAAGCIGTA AAGCEACAAA TAUl'lUl'ICA AATOGAACQC CAGATGCAGT 350 
. SCK ATN SSS NGTP DAV 
EAV KLQI VLQ MEP QMQS 
KL. SYK F F K WNP RCS 



10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

OCATCACIAA AATCTAOQGT GGAOQOCTGG AQQQQQCTGC TAGACTATGC 400 
HD. NLPW TPG PAC T M L 

MT K IYR GPLD RPA RLC 
P.LK STV DPW TGLL DYA 

TCIGAT3TTA ATGACATTGA AGTCACOOCT OCOGAGGAAA TCICAACIX3C 450 

. C. . H . SHPSRGNLNC 
SDVN DIE VTP PEEI STA 
LML MTLK S P L PRK SQLH 

ACAAOQQCTA CIACACIDCA ATICAGTAGG AAGCAGTTAG AGCAGTTCIC 500 
TTPT T L Q FSR KQLE QLS 
QPL LHSN SVG SS. SSCQ 
NPY YTP IQ.E A V R AVV 

AGOCAAGCIC CCCAACAGTA CTIGGGri'lT GCIGTIGAGA QQGTGGACTG 550 
ANL PNST WVF LLR GWTE 
PTS PTV LGFS C.E GGL 
SQPP QQY LGF PVER VD. 

AGAGACAQGA CIAQCT3GAT TTOCTAQQCT GACIAAGAAT OQCNAAGOCT 600 

RQD L D FLG - LRI PKP 

RDRT SWI S.A D. ES XSL 
ETG LAGF PRL TKN PXAX 

ANCIGGGAAG GIGACOGCAT CCATCTTTAA ACATGGGGCT TGCAACTTAG 650 
XWEG DRI H L TWGL QLS 

XGK VTAS IFK HGA CNLA 
LGR .PH PSLN MGL AT. 

CTCACAOOOG ACCAATCAGA GAGCTCACTA AAATGCTAAT CAGQCAAAAA 700 
SHP TNQR AH. NAN QAKT 
HTR PIR ELTK MLI RQK 
LTPD QSE SSL KC.S GKN 



850 



900 



10 20 30 40 50 

1334567890 1 234567890 1 9^4567890 1234567890 1234567890 

CAQGAQCTAA AGCAATAGGC AATCATCTAT 1QOCIGAGAG CACAGaQQGA 750 

GGK A I A NHLL PES TAG 
QEVK Q.P I I Y CLRA QRE 
RR. SNSQ SSI A.E HSGK 

AGGACAAGGA TIQQGATATA AACICAGGCA TTCAAGOCAG CAACAGCAAC 800 
RTRI G I . TQA FKPA TAT 
GQG LGYK LRH SSQ QQQP 
DKD W D I NSGI QAS NSN 

CCCCTTTG33 TCCCCIGCCA TTGTATGGGA GCTCIGTTTT CACICTATTT 
PFG SPPI VWE LCF HSIS 
PLG P'LP LYGS SVF T L F 
PLWV PSH CMG ALFS LYF 

CACTCTATTA AATCATGCAA CT3CACTCTT CIGGTQQGTG TITTTTATGG 

L Y I M Q LHSS GPC F L W 

HSIK SCN CTL LVRV FYG 
TLL NHAT ALF WSV FFMA 

CiCAAGCIGA GCTITTGTTC GCCATCCAOC ACIGCIGTTT GCCACQGTCA 950 
LKLS FCS PST TAVC HRH 
SS. AFVR HPP LLF ATVT 
QAE LLF AIHH CCL PPS 

CAGACCCGCr QCIGACTTCC ATGOCTTTGG ATOCAGCAGA GTGTGCACIG 1000 
RPA ADFH PFG SSR VSTV 
DPL LTS IPLD PAE CPL 
QTRC .LP SLW IQQS VHC 

TGCTCCTGAT CCAGCGAGGT AGOCATTGCC ACTQGOGATC AGGCEAAAGG 1050 

LLI QRG THCH SRS G.R 
CS.S SEV PIA TPDQ AKG 
APD PARY PLP LPI RLKA 
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10 20 30 40 50 

1234567890 1234567890 1234567890 1234567890 1234567890 

CITGOCATTG TTXTTGCATG GCTAAGTGOC TG3GTTT3TC CTAATAGAAC 1100 
LAIV PAW LSA WVCP NRT 
LPL FLHG .VP GFV LIEL 
CHC SCM AKCL GLS . .N 

TGAACACTGG TCACTGGGTT CCKK33TTCT CTIOCAIGAC GGA03QCTIC 1150 
EHW SLGS MVL FHD PRLL 

NTG HWV PWFS SMT HGF 
. TLV TGF HGS LP.P TAS 

TAATAGAGCT ATAACACICA CXHQCAT3QQC CAAGATIXDGA TTCCTIGGIA 1200 
I E L .HS PHGP RFH SLV 
. SY NTH RMA QDSI PWY 
NRA ITLT AWP KIP FLGI 

TCIGTGAQQC CAAGAAOOOC AGGTCAGAGA ANGTGAGGCT TGOCAGCATT 1250 
SVRP RTP GQR X.GL PPF 
L.G QEPQ VRE X E A CHHL 
CEA KNP RSEX VRL ATI 

TGQGAAGTGG OOCACTGGGA TITTGGTAGC GGOOCAGCAC CATCTTGGGA 1300 
GKW PTAI LVA AHH HLGS 
GSG PLP FW.R PTT.ILG 
WEVA HCH FGS GPPP SWE 



GCTGTGQGAG CAAGGATQOC CCAGTAACA 

CGS KDP PVT 
A V G A RIP Q. 
LWE QGSP SN 



1329 
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10 20 30 40 50 

123456789 0 1234567890 1234567890 1234567890 1234567890 

COTAGAACGT ATTCTGGAGA ATTGGGACCA ATGTTGACACT CAGACGCTAA 50 
PRTY SGE LGP M.HS DAK 
LER ILEN WDQ CDT QTLR 
• NV FWR IGTN VTL RR. 

GAAAGAAACG ATITATATTC TTCTGCAGTA OQGCCTOQCC ACAATATCCT 100 
KE-T IYIL LQY RLA TISS 
KKR FIF FCST AWP QYP 
ERND LYS SAV PPGH NIL 

CTICAAGGGA GAGAAAOCTC GCTTGCTGAG GGAAGTATAA ATTATAACAT 150 

SRE RNL AS.G KYK L.H 
LQGR E TW LPE GSIN YNI 
FKG EKPG FLR EV. I I T S 

CATCTTACAG CTAGACCrCT TCTGTAGAAA GGAGQGCAAA TGGAGTCAAG 200 
HLTA RPL L.K GGQM E.S 
ILQ LDLF CRK EGK WSEV 
SYS . T S SVER RAN GVK 

T3CCATATCT GCAAACITIC TTTICATTAA GAGACAACTC ACAATEATGT 250 
AIC A N F Li FIK RQL TIM. 
PYV QTF FSLR DNS QLC 
CHMC KLS FH. ETTH NYV 

AAAAAGTGTC GTTEAIGOOC TACAGGAAGC CCTCAGAGTC CACCTOOCIA 300 

KVW FMP YRKP SES TSL 
KKCG LCP TG S PQSP PPY 
KSV VYAL QEA LRV HLPT 

CXXCAQCGIC CXXTOOOOGA CTOCTTCCTIC AACEAATAAG GACOOOOCTT 350 
PQRP LPD SFL N .G PPF 

PSV-PSPT PSS TNK DPPL 
PAS PPR LLPQ LIR TPL 

TAAOCCAAAC GGTOCAAAAG GAGATAGACA AAGQGGTAAA CAATGAACX2A 400 
NPN GPKG DRQ RGK Q.TK 

TQT VQK EIDK GVN NEP 
- PKR SKR R.T KG.T MNQ 

AAGAGIGOCA ATA3TO0C0G ATTAIGOOCC CTOCAAGCAG TGAGAGGAGG 450 

ECQ YSP IMPP PSS ERR 
KSAN IPR LCP LQAV RGG 
RVP IFPD YAP SKQ E E E 

AGAATID3GC CCAGOCAGAG TGCCIGTAGC TTTTTCTCTC TCAGACTEAA 500 
RIRP SQS ACT FFSL RLK 
EFG PARV PVP FSL SDLK 
NSA QPE CLYL FLS QT. 
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AGCAAATTAA AATAGACCTA GGTAAATTCT CAGATAACOC TCAGGGCTAT 550 
AN. NRPR . I L R.P R L Y 

QIK IDL GKFS DNP DGY 
SKLK .T. VNS QITL T A I 

ATTCATCITT TACAAGGGTT AQGACAATCC TTIGATCIGA CATOGAGAGA 600 

. C F TRV RTIL . S D MER 
IDVL Q G L GQS FDLT WRD 
L M F YKG. DNP LI. HGEI 

TATAATCTTA CTACTAAATC AGACACTAAC COIAAATCA3 AGAACT300G 650 
YNVT TKS DTN PK.E KCR 
I M L LLNQ TLT PNE R S A A 
. CY Y.I RH. P QMR EVP 

CICTAACIGC AQQOCGAGAG TTTOGOGATC TTIGGTATCT CAGICSGGQC 700 
CNC SPRV WRS L V S QSGQ 
VTA ARE FGDL WYL SQA 
Li . Li Q PES LAI FGIS.VRP 

AACAATAGGA TGACAACAGA GGAAAGAACA ACTOOCACAG G0CAGC2GGC 750 

Q.D DNR GKNN SHR PAG 
NNRM T TE ERT TPTG QQA 
TIG . Q Q R KEQ LPQ ASRQ 

^GTTTOOCAGr GT^GACOCTC ATPGGGACAC AGAATCAGAA CATOGAGAIIT 800 
SSQC RPS LGH RIRT WRL 
VPS VDPH WDT ESE HGDW 
FPV . T L IGTQ NQN MEI 

QGIGGCACAA ACATITGCTA ACTIQOGIQC TAGAA3GACT GAGGAAASCT 850 
VPQ TFAN LRA RRT EEN. 
CHK HLL TCVL EGL RKT 
GATN IC. LAC .KD. GKL 

AGGAAGAAGC CTATCAATTA CICAATCATC TCCACEATAA CACA3QGAAA 900 

EEA YEL LNDV HYN TG K 
RKKP MNY SMM STIT QGK 
GRS L.IT Q.C PL. HRER 

GGAAGAAAAT CITACIGCTT TICIGGACAG GCATIGSGGA 950 

GRKS YCF SGQ TKGG IEE 
EEN LTAF LDR LRE ALRK 
K K I LLL FWTD .GR H.G 

AGCATADCTC O^KJICPOCT GATTCTTATIG AAGQOCAACT AATCTEAAAG 1000 
AYL PVT. LY. RPT NLKG 
HTS LSP DSIE GQL ILK 
SIPP CHL TLL KAN. S.R 
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GATAAGTTTA TCACTCAGTC AGCIGCAGAC ATTAGAAAAA ACTTCAAAAG 1050 

-VY HSV SCRH . K K LQK 
DKFI TQS A A D IRKN FKS 
ISL S L § Q LQT L E K TSKV 

TCTOOCTEAG GCCCX3GAGCA GAACTTAGAA ACCCTTATTTA ACTTOQCATC 1100 
SALG PE Q NLE T L F N LAS 
LP. ARSR T.K PYL TWHP 
CLR PGA ELRN PI. LGI 

CICAGmTr TATAATAGAG ATCAGGAGGA GCAQGGGAAA OQGGACAAAC 1150 
SVF YNRD QEE Q A K RDKR 
QFF I I E IRRS RRN GTN 
L S F L . .R SGG AGET GQT 

GGGATAAAAA AAAAAGGQGG GGTGCACTAC TTTAGTCATC GCCCTCAGGC 1200 

DKK KRG G P L L .SW PSG 
GIKK KGG VHY FSHG PQA 
G.K KKGG STT L V M ALRQ 

AAGCAGACTT TOGAGGCTCT GCAAAAGQGA AAAGCTOGGC AAATCAAATC 1250 
KQTL EAL Q K G KAGQ IKC 
SRL WRLC KR E KLG KSNA 
ADF GGS AKGK SWA NQM 

OTrAATftGGG CTGGCTIOCA GIGGGGTCTA CAAGGACACT TEAAAAAAGA 1300 
LIG LASS A V Y KDT LKKI 
• - GWLPVRSTRTL .KR 
PNRA GFQ CGL QGHF KKD 

TEATOCAAGT AGAAATAAQC CGOCOCCTTG TO0MXX30CC TTAOGTCAAG 1350 

I Q V EIS RPLV HAP YVK 
LSK - K.A APL SMPL TSR 
YPS RNKP PPC PCP LRQG 

GGAATCACIG GAAGQOOCAC TGOCOCAGQG GATGAAGATA CTCIGAGICA 1400 
GITG RPT A P G DEDT LSQ 
ESL EGPL PQG MKI L.VR 
NHWKAHCPRG R Y S E S 

GAAGOCATEA ACCAGATGAT GCAGCAGCAG GACTCAGQGT GCCCGGQGCG 1450 
KPL TR.S SSR TEG ARGE 
SH. PDD P A A G LRV PGA 
E A I N QMI QQQ D.GC PGR 



AGCGOCAGOC CA3GOCATCA OCCICACAGA GCCCGGGGTA TCITTGACCA 

RQP MPS PSQS PGY V P 
SASP CHH PHR APGM FDH 
APA HAIT LTE PRV CLTI 



1500 



10 20 30 40 50 

1234567890 12.14567890 1234567890 1234567890 1234567890 

TTGftGAGOCA A 1511 
L R A 
. E P 
ESQ 



Best Available Copy 



10 

; 1234567890 U £ 
j ASGOGCAGCA GC "A 
M G S S 

OQQCAQQCAT AI^.C- 
G S H M A 

TMAAGGTIAT TCIGC ' 
E R I L F 

sk: k r f i 



0 40 50 

_LiL 11 1 vO 12 4. :7890 12^4567890 

" TC -.? : : -C AG'IA.OGGOC TOGTOOOQCG 50 
M H S 5 G L V P R 

TGACT 'vJir-G AZAG^AATG QCTQQGATCC 100 
r G G Q C. M G R I L 

TOCGAOCAAT GIGACACTCA GAOGCTAAGA 150 
WD QC DTQTLR 

CR-CAGTACC GCCTG3CCAC AATATOCTCT 200 
C S T A VJ P Q Y P L 



ig^AGGGAGA GAAA 
P}VG RET 



TIT "CTGAl OG AAGIATAAAT TATAACATCA 

p::g s inynii 




" ATICC 
||^p^N I P 




tagac™gg 

I DIG 



TG3 AGAAAGG AGG3CAAATC GAGIGAAGIG 
CRKE GKW SEV 

TTCATTAAGA GACAACTCAC AATEA3GTEAA 
S L F. D N S Q L C K 

GACGA-GCCC TC.-GAGTOCA aCKXDCEADC 
C: F Q 5 P P P Y P 

CCITCTTGAA CTAATAAG3A QJGOO JITIA 
PSST NKD PPL 

GATAGACAAA GQQCTAAACA ATCAAOCAAA 
I D K GVNN EP K 

TATOOOOOCT CXZAAGCAGIG AGAGGAGGAG 
C:PL Q. A V RGGE 

CCTCTACCTT T1VLCICICIC A3ACTEAAAG 
PVP F SLS DLK 

TAAATICTCA GATAAGOCTG AD3QCTATAT 
KFS DNPDGYI 




^a ^^j BACT ACTA,- 
M L L L : 

GEAACIQCAG OCCG.- 
V T A A 



GACAATOCTT TCAICIGACA 1GGAGAGATA 
QSF DLT WRDI 

AQ-CTAACOC CAAATGAGAG AAGIGC03CT 
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to: og- ,t;.tt tgchmcica gigaogocaa 
gdi, wyls qan 
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CTC GACAC-AC TAAG3GAQ3C AITGft3GAAG 1050 
LDRL.REA LRK 

' ITT.TATTGAA QQOCAACTAA TCITAAAGGA 1100 
S I E G Q L ME L K D 

vTK CAGAf AT TAGAAAAA^-.TTCAAAAGTC 11 SO 
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A -* I-'. A G GGIX1GGATOC TAGAAOCTAT 50 
3 ; : GRIL ERI 

c-hc-a: - ,:'.- a gacgcta^ga aagaaacgat 100 
i t l r k k r f 
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TTICT TT^-Tr AGACTIAAAG CAAATEAAAA 
S 7 S DLK QIKI 

GATAAXX77G AOGGCTA3A3? TCATCTITEA 
D N P D G Y I D V L 

TCATCTGACA T3GAGAGA3A TAMCTEftCT 
DLT WRDI MLL 
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TGTOQQCTGT GCTOCIGATC CAQCACAQGC GCCCATTQOC TCIOCCAATT 50 
CPLC S.S STG A H C L SQL 
VRC APDP A Q A PIA SPNW 
SAV L L I QHRR PLP LPI 

GQGCTAAAGG CITGCCATIG TIGCIGCACA QCTAAGIGOC TGGGTTCATC 100 
G.R LAIV PAQ LSA WVHP 
AKG LPL FLHS .VP GFI 
GLKA CHC SCT A K C L GSS 

CTAATCGAGC TGAACACTAG TCACIGQGTT GCACGGTICr CITOCATGAC 150 

NRA EH. S LGS TVL FHD 
LIEL NTS HWV PRFS SMT 
. SS T L V TGF HGS LP.P 

GCATOQCTIC TAATAGAGCT ATAACACICA CT3CA3GGTC CAAGATTOCA 200 
PWLL IEL .HS LHGP RFH 
HGF . .SY NTH CMV QDSI 
MAS NRA ITLT AWS KIP 

TIGCTTGGAA TCOGTGAGAC CAAGAAGGOC AGGTCAGAGA ACACAAGQCT 250 
SLE SVRP RTP GQR TQGL 
PWN P.D QEPQ VRE HKA 
FLGI RET KNP RSEN TRL 

TGOCAGCATG TTGGAAGGAG CQCACCAOCA TITTGGAAGC AGQOQGOCAC 300 

PPC WKQ PTTI LEA ARH 
CHHV GSS PPP FWKQ PAT 
ATM LEAA HHH FGS SPPL 

TATCITGQGA GCICIGGGAG CAAQGADCQC AQGTAACAAT TTGGTGACCA 350 
YLGS SGS K DP R.QF GDH 
ILG ALGA RTP GNN LVTT 
SWE LWE QGPQ VTI W.P 

OGAAGGGACC TGAATGOGGA AGCATGAAGG GATCTOCAAA GGAATTGGAA 400 
EGT I R N HEG ISK AIGN 

KGP ESA TMKG SPK QLE 
RRDL NPQ P.R D.LQS NWK 
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ATGTTCCTOC CAAGGCAAAA ATGCCCCTAA GATGTATTCT GGAGAATIGG 450 

VPP KAK MPLR C I L ENW 
MFLP RQK CP. DVFW RIG 
CSS QGKN APK MYS GELG 

GACCAATTIG ACCCICAGAC AGTAAGAAAA AAATGACTTA TATICTICTG 500 
DQFD PQT VRK K.LI FFC 
TNL TLRQ .EK N D L YSSA 
PI. PSD SKKK MTY ILL 

CAGTACCGCC CIGGCCACGA TATCCTCTTC AAGGGGGAGA AACCTQGCCT 550 
STA L A T I SSS RGR NLAS 
VPP WPR YPLQ GGE TWP 
QYRP GHD ILF KGEK P GL 

CCIGAGQGAA GTATAAATTA TAACACCATC TTACAGCTAG AGCT GTJ.T1 G 600 

• GK YKL H H L TAR PVL 

PEGS INY NTI LQLD LFC 
LRE V.II TPS YS. TCFV 

TAGAAAAGGA QGCAAATGGA GTGAAGTGCC ATATITACAA ACTTTCTTTT 650 
. KRR QME .SA IFTN FLF 
RKG GKWS EVP Y L Q TFFS 
EKE ANG VKCH IYK LSF 

CATTAAAAGA CAACICQCAA TTATGTTAAC AGTGTGATTT GIGITCCTAC 700 
IKR QLAI MLT V.F VFLH 
LKD NSQ LC.Q CDL CSY 
H.KT TRN YVN SVIC VPT 

ACGGAAGCCC TCAGATTCTA CICOCCACCC COTGCATCTC CCCTGAATCC 750 

GSP QIL LPTP GIS PES 
TEAL RFY SPP PASP LNP 
RKP SDST PHP RHL P. IP 
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TGTCOQCTGT QCICCIGATC CAQCACAQGC QOOCATIGOC TCTOOCAATT 50 
CPL.C S.S STG AHCL SQL 
VRC APDP A Q A PIA SPNW 
SAV L L I QHRR PLP L P I 

GGGCTAAAGG CTTGOCATTG TTQCTGCACA GCTAAGTGCC TQQGflTCATC 100 
G.R L A I V P A Q LSA WVHP 
AKG L P Li F L H S .VP GFI 
GLKA CHC SCT AKCL GSS 

CTAATCGAGC TGAACACTAG TCACIGQGTT CCAOOGTICT CITCCATCAC 150 

NRA EH..SLGS TVL FHD 
LIEL NTS HWV PRFS SMT 
. SS T L V TGF HGS LP.P 

CCATOQCTIC TAATAGAGCT ATAACACTCA CTGCATGGTC CAAGATTCGA 200 
PWLL IEL .HS LHGP RFH 
HGF . .SY NTH CMV QDSI 
MAS NRA ITLT AWS KIP 

TTCCTTGGAA TQCGTGAGAC CAAGAACOGC AGGTGAGAGA ACACAAGGCT 250 
SLE SVRP RTP GQR TQGL 
PWN P.D QEPQ VRE HKA 
FLGI RET KNP RSEN TR L 

TGCCACCATG TTGGAAGCAG GGCAOCAQCA TITTGGAAGC GGOGOQOCAC 300 

PPC WKQ PTTI LEA ARH 
CHHV GSS PPP FWKR PAT 
ATM LEAA HHH FGS GPPL 

TATCITGGGA GCTCTGQGAG CAAGGAOOOC CAGGTAACAA TITGGTGAOC 350 
YLGS SGS K.DP QVTI W. P 
ILG ALGA RTP R.Q FGDH 
SWE LWE QGPP GNN LVT 

AOGAAGQGAC CIGAATOOGC AAGCATGAAG QGATCIOCAA AGCAATTGGA 400 
RRD LNPQ P.R DLQ SNWK 
EGT .IR NHEG ISK A I G 
TKGP ESA TMK GSPK QLE 
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AAIGTICCTC CCAAQGCAAA AATGGQQCTA AGATGTATTC TGGAGAATIG 450 

CSS QGK NAPK MYS GEL 
NVPP KAK MPL R C I L ENW 
M F L PRQK CP. DVF WRIG 

GGACCAAICT GACCCICAGA CAGTAAGAAA AAAAATGACT TATATICTIC 500 
GPI. PSDSKKKNDLYSS 
DQS DPQT VRK KMT YILL 
TNL TLR Q.EK K.L IFF 

TGCAGTACCG CCTOQCCACG GATA.TCCTCT TCAAGQGGGA GAAACCT3GC 550 
AVP PGHG YPL QGG ETWP 
QYR LAT DILF KGE KPG 
CSTA WPR ISS SRGR NLA 

CICCIGAGGG AAGTATAAAT TATAACACCA TCTEACAGCT AGACCIGTTT 600 

PEG SIN YNTI LQL DLF 
LLRE V.I ITP SYS. TCF 
S.GKYKL .HHLTARPVL 

T3TAGAAAAG GAGGCAAATC GAGIGAAGIG QCATATTTAC AAAC' l ' l ' lUlT 650 
CRKG GKW SEV PYLQ TFF 
VEK EANG VKC HIY KLSF 
. KR RQM E.SA IFT NFL 

TICAITAAAA GACAACICGC AATEATGTAA ACAGTGTGAT TIGTGTCCTA 700 
SLK DNSQ LCK QCD LCPT 
H.K TTR NY VN SVI CVL 
FIKR QLA IM. TV.F VSY 

CAGGAAGCCC TCAGATCTAC CTCCCTACCC CGOCATCICC CTGACICCTT 750 

GSPQIYLPTPASP .LL 
QEAL RST SLP RHLP DSF 
RKP SDLP PYP GI S LTPS 

CCOCAACTAA TAAGGACCCA CTICAGQCCA AACAGTCCAA AAGGACATAG 800 
PQLI RTH FSP NSPK GH 
PN. G P T S A Q TVQ KDI 

PTN KDP LQPK QSK RT. 
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GQ2ATIGATA QCACOCATCA GATGQCCAAA TCATEATITA CIGGAGCAGG 50 
GIDS THQ MAK SLFT GPG 
A L I APIR WPN HYL LDQA 
H. . HPS DGQI I I Y WTR 

O LTITI CAAA ACTATCAAGC AGATAQGGCC CGTGAAGCAT GGCAAAGAAA 100 
LFK TIKQ IGP VKH AKEI 
FSK LSS R.GP . S M PKK 
PFQN YQA DRA REAC QRN 

TAATCCCCTG GCT3A20GCC ATGTICCTTC AGGAGAACAA AGAACAGQCC 150 

IPC L I A MFLQ ENK EQA 
. SPA LSP CSF RRTK NRP 
NPL PYRH VPS GEQ RTGH 

ATIACCCAGG GGAAGACIGG CAACTAGATT TTACGCACAT GGCCAAATGT 200 
ITQG KTG N.I LPTW PNV 
LPR GRLA TRF YPH GQMS 
Y P . G EDW QLDF THM AKC 

CAQQGATTIC AGCATCTACT AGTCIGGGCA GATACTTICA CIGGTIGGGT 250 
RDF SIY. SGQ ILS LVGW 
GIS AST SLGR YFH WLG 
QGFQ HLL VWA DTFT GWV 

GGAGICITCT CCTIGTAGGA CAGAAAAGAC CCAAGAGGTA ATAAAGGCAC 300 

SLL LVG QKRP KR. .RH 
GVFS L.D RKD PRGN KGT 
ESS PCRT EKT QEV IKAL 

TAATGAAATA ATTCCCAGAT TIGGACTTCC CCCAGGATTA CAGGGIGACA 350 
. N N SQI WTS PRIT G.Q 

NEI IPRF GLP P G Li QGDN 
MK. FPD LDFP QDY RVT 
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ATQGQOCOQC TTTCAAQGCT GCAGTAACCC AQQGAGTATC CO^GGTGTTA 400 
WPR FQGC SNP GSI PGVR 
GPA FKA AVTQ GVS QVL 
MAPL SRL Q.P REYP RC. 

GGCATACAAT ATCACTTACA CIGIGCCTGG AQGCCACAAT CCTCCAGAAA 450 

HTI SLT LCLE ATI LQK 
GIQY HLH CAW RPQS SRK 
AYN ITYT VPG GHN PPEK 

AGTCAAGAAA ATGAATGAAA CACICAAAGA TCTAAAAAAG CTAACCCAAG 500 
SQEN E.N TQR SKKA NPR 
VKK MNET L K D LKK LTQE 
SRK .MKHSKI .KS . P K 

AAACCCACAT TGCAIGACCT GTICTGTIGC CIATAACCTT ACTAAGAATC 550 
NPH CMTC SVA YNL TKNP 
THI A.P VLLP ITL LRI 
KPTL HDL FCC L.PY . E S 

CATAACTATC CCCCAAAAAG CAG3ACITAG OOCATACGAG ATGCTATATG 600 

. LS PKK QDLA HTR CYM 
HNYP PKS RT. PIRD A I W 
ITI PQKA GLS PYE MLYG 

GATGGCCTTT CCTAACCAAT GAQCTIGTGC TTGACIGAGA AATGGQCAAC 650 
DGLS . P M TLC LTEK WPT 
MAF PNQ. PCA L R NGQL 

WPF LTN DLVL D.E MAN 

TTAGTTGCAG ACATCACCIC CTTAQQCAAA TATCAACAAG TTCTTAAAAC 700 
. Li Q TSPP . P N INK FLKH 
SCR H H L LSQI STS S.N 
LVAD ITS LAK YQQV L.KT 
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ATCACAQGGA ACCTGVCCCC GAGAQGAQQG AAAQGAACm TIOCAQCCIG 750 

HRE PVP ERRE RNY STL 
ITGN LSP RGG KGTI PPW 
SQG TCPR EEG KEL FHPG 

GTGACATC 758 
V T 
. H 
D M 



